← Back to Design & Development
High-Level Design

Designing Twitter

From "one DB melting under 325K reads/sec" to a fan-out-on-write timeline with a special lane for Elon — the architecture that earns every box

Read this with the framework in mind

This deep-dive applies the 4-step HLD interview framework. As you read, map each section to Requirements → Entities → APIs → High-Level Design → Deep Dives, and notice which of the 8 common patterns and key technologies are at play.

Framework → 8 Patterns → Tech Cheat Sheet →
Step 1

What is Twitter?

Imagine it's 9:37am. Elon Musk fires off a 140-character tweet about a new Tesla feature. He has roughly 100 million followers. Within seconds, every one of those 100M people who opens the Twitter app should see that tweet near the top of their home timeline — interleaved with tweets from everyone else they follow, sorted by recency, with photos and videos rendering inline. At the same moment Elon is tweeting, Sarah in Bangalore is scrolling her own timeline (she follows 200 people, not 100M), Raj in São Paulo is searching for "World Cup", and 200 million other daily users are doing some mix of all of this. That choreography — happening 24/7 across the globe — is the system we're designing.

Twitter is, at its heart, a microblogging service: short messages called tweets (originally capped at 140 characters, now longer), a directional follow graph (you follow me; that doesn't mean I follow you back), and a timeline — the chronological stream of tweets from the people you follow. Add photos, videos, replies, retweets, hashtags, search, and trending topics, and you have the product.

The two questions that drive every design decision below: (1) When Sarah opens the app, how do we assemble her timeline of 200 people's recent tweets in under 200ms — without re-querying the database every time? (2) When Elon tweets to 100M followers, how do we deliver it to all of them in seconds without his single tweet causing 100 million simultaneous database writes?
Step 2

Requirements & Goals

Pin down the product surface before drawing boxes. In an interview, asking these questions out loud signals you're not just regurgitating an architecture — you're deciding what to build first.

✅ Functional Requirements

  • Users can post tweets (text, optionally with photos and videos)
  • Users can follow other users (directional)
  • Users can favorite (like) tweets
  • Users can view a home timeline — top tweets from the people they follow, sorted by recency
  • Service must support photos and videos attached to tweets

⚙️ Non-Functional Requirements

  • Highly available — Twitter being down is a global news event
  • Timeline latency under 200ms p99 — slower than that and users feel it
  • Eventually consistent — small staleness OK; if Sarah's tweet shows up 3 seconds later in Raj's timeline, no one cares

➕ Extended

  • Search tweets by keyword
  • Replies & threaded conversations
  • Trending topics (hashtag aggregation)
  • Tag users with @mentions, push notifications
  • Who to follow recommendations, Moments
The hard requirements aren't the obvious ones. "Post a tweet" is a row insert. "Show a timeline" is the one that breaks every naive design — because it reads tweets from many users, sorted, paginated, with media URLs resolved, all in under 200ms while 200 million people do the same thing.
Step 3

Capacity Estimation & Constraints

Numbers drive every architectural choice. Twitter is famously read-heavy: people read far more than they post. Let's get rough quantities so the rest of the design has weight behind it.

Users & activity

Assume 1 billion total users with 200 million daily active users (DAU). 100 million new tweets per day. Each user follows on average 200 other users.

Tweets/sec

~1,150/sec

100M / 86400

Timeline reads/sec

~325K/sec

200M DAU × 7 page-views × 20 tweets / 86400

Favorites/sec

~12K/sec

1B favorites/day spread out

Avg follows / user

~200

Drives fan-out cost

Storage estimate (per day)

Each tweet ≈ 280 bytes of text + 30 bytes of metadata (id, user_id, timestamps, counters). That gives 100M × 310B ≈ ~30 GB/day text. Add photos and videos:

Text

~30 GB/day

Tiny relative to media

Photos (~5M/day, 200KB)

~1 TB/day

Lives in object storage + CDN

Videos (~720K/day, 30MB)

~22 TB/day

Bulk of all storage

Total storage growth: ~24 TB/day dominated by video. Over 5 years: ~43 PB.

Bandwidth estimate

Outgoing bandwidth (people watching videos and viewing photos in their timeline) dwarfs ingest. Roughly ~35 GB/sec egress, mostly video served via CDN. Text is rounding error — 35 KB/sec wouldn't move the needle.

MetricValueWhy it matters
Tweets/sec1,150/sDrives write throughput & fan-out worker pool
Timeline reads/sec325K/sForces pre-computed timelines and aggressive caching
Avg follows/user200Each tweet fans out to 200 timeline caches on average
Storage/day24 TBForces sharded storage; video → object store + CDN
Egress35 GB/sVideo must be CDN-served — origin can't carry it
The single most important ratio: 325K reads / 1,150 writes ≈ 280:1. Reads dominate writes by almost three orders of magnitude. Any design that does work on the read path that could have been pre-computed at write time will lose at scale.
Step 4

System APIs

Five endpoints carry essentially all the traffic. Defining them up front locks the contract before architecture, and reveals which calls are write-path (rare) vs. read-path (constant).

REST API surface
// 1. Post a tweet — write path, low QPS
POST /api/v1/tweet
{
  "api_key":   "abc123...",
  "text":      "Just landed in SF!",
  "location":  { "lat": 37.77, "lon": -122.41 },   // optional
  "media_ids": ["img_92af", "vid_71be"]            // optional, pre-uploaded
}
→ 201 Created  { "tweet_id": "1648...3922", "created_at": "2026-05-07T14:02:06Z" }

// 2. Fetch home timeline — read path, very high QPS
GET /api/v1/timeline?user_id=42&count=20&page_token=eyJ0Ijox...
→ 200 OK
{
  "tweets": [ { "tweet_id": "...", "user": {...}, "text": "...", "media": [...] }, ... ],
  "next_page_token": "eyJ0Ijox..."
}

// 3. Follow another user
POST /api/v1/follow
{ "user_id": 42, "target_user_id": 99 }
→ 204 No Content

// 4. Favorite a tweet
POST /api/v1/favorite
{ "user_id": 42, "tweet_id": "1648...3922" }
→ 204 No Content

// 5. Search tweets
GET /api/v1/search?api_key=...&query=worldcup&count=20&sort=recent
→ 200 OK  { "tweets": [...], "next_page_token": "..." }
Why timeline uses a page_token and not page=N: tweets are constantly being added at the top. If we used numeric pages, "page 2" would shift down by however many new tweets appeared since "page 1" — users would see duplicates. A token encodes "give me items older than tweet X", which is stable as new tweets arrive on top.
Media is uploaded separately via a pre-signed S3 URL flow, not in the tweet POST. The client uploads the photo or video to object storage, gets back a media_id, and only sends the small media_id in the tweet body. This keeps tweet writes fast and lets media uploads stream directly to storage without our app servers in the path.
Step 5

Database Schema

Four core entities. Two patterns matter: tweets are written constantly and queried by id or by user_id; the follow graph is queried as "who does user X follow?" and "who follows user X?". This shape — heavy writes on one table, fan-out reads on another — is what pushes us to a hybrid SQL + NoSQL stack.

erDiagram TWEET { bigint id PK bigint user_id FK string content string location timestamp creation_date int num_favorites } USER { bigint id PK string name string email timestamp creation_date timestamp last_login } USER_FOLLOW { bigint user1_id PK bigint user2_id PK timestamp created_at } FAVORITE { bigint tweet_id PK bigint user_id PK timestamp created_at } USER ||--o{ TWEET : "posts" USER ||--o{ USER_FOLLOW : "follows" USER ||--o{ FAVORITE : "likes" TWEET ||--o{ FAVORITE : "is liked by"

SQL or NoSQL? Both — pick per table.

📦 Tweets → Cassandra (NoSQL)

1,150 writes/sec sustained, 100M+ rows/day, queried almost exclusively by primary key (tweet_id) or by partition (user_id). No joins, no transactions across tweets. Cassandra's wide-column model is purpose-built for this — append-heavy, partition-by-key, replicated by default.

👤 Users & Follow Graph → MySQL

Lower write rate, strong consistency matters (a "follow" must be visible immediately to that user), and we want relational queries ("count my followers", "is user X following user Y"). MySQL with read replicas handles 10K writes/sec easily; the graph fits in a few hundred GB even at 1B users.

Polyglot persistence is normal at this scale. No single database is best at everything. Twitter publicly splits like this: tweets in a custom Cassandra-derived store called Manhattan, social graph in a sharded MySQL called Gizzard, search in a Lucene-based system called Earlybird. Each table goes to the engine whose access pattern it matches.

The num_favorites column on TWEET is a denormalized counter — the source of truth lives in FAVORITE rows, but counting them on every render would be too slow. We update the counter asynchronously (more in §12).

Step 6 · CORE

High-Level Architecture — From Naive to Production

This is the section that wins or loses the interview. We'll build the architecture in three passes: the simplest thing that could plausibly work, why it falls apart, and the production shape where every box justifies itself. Numbers from §3 drive every decision.

Pass 1 — The naive design (and why it breaks)

Sketch the simplest possible system: one app server talks to one MySQL database. To post a tweet, insert a row. To show Sarah's timeline, run a query like: "SELECT tweets FROM tweet WHERE user_id IN (the 200 people Sarah follows) ORDER BY creation_date DESC LIMIT 20". Done.

flowchart LR C["Client"] --> APP["App Server"] APP --> DB[("MySQL — tweets + users + follows")]

Three concrete failures emerge the moment real traffic shows up:

💥 Timeline read is a fan-in nightmare

Sarah follows 200 people. To build her timeline, the DB must touch 200 user-partitions, sort their recent tweets by time, and return the top 20. Now imagine Raj follows 5,000 people — his query touches 5,000 partitions for a single page-view. With 200M DAU each refreshing 7 times a day, that's billions of multi-partition fan-in queries per day. No DB survives this.

💥 325K reads/sec melts one DB

A single MySQL instance handles roughly 5K-10K simple queries/sec. We need 325,000. Even with read replicas, the timeline query above hits dozens of partitions per request — multiplying the underlying DB load by 10×-50×. CPU pegs at 100%, p99 latency goes from 5ms to 5 seconds, every Twitter user's app feels broken.

💥 Writes compete with reads

1,150 tweet inserts/sec sounds tiny next to 325K reads — until you realize they share the same DB. Write contention on indexes (especially the timestamp index every timeline query depends on) makes both sides slower. A single celebrity tweet can lock indexes for milliseconds and ripple latency across thousands of in-flight reads.

Pass 2 — The mental model: pre-compute timelines (fan-out-on-write)

The single most important insight in this design comes from the read/write ratio in §3. Timelines are read 280× more often than tweets are written. So instead of computing the timeline at read time (expensive, repeated), compute it once at write time and cache it. When Sarah posts a tweet, push it into the pre-built timelines of every one of her followers. When Raj opens the app, his timeline is already assembled — read it from a single cache key in 5ms.

✍️ Push (fan-out-on-write)

On tweet, look up all of the author's followers and write the new tweet_id into each follower's timeline cache. Read becomes O(1): grab the cached list of tweet_ids, hydrate them, return. This is the classic "do work at write time so reads are free" trade.

Breaks for celebrities. Elon has 100M followers. One tweet from him would trigger 100 million cache writes in a few seconds — overwhelming the fan-out workers and causing huge latency spikes for everyone else's tweets behind his in the queue.

📖 Pull (fan-in-on-read)

Don't fan out anything at write time. At read time, query each followed user's recent tweets and merge. Cheap writes, expensive reads. This is the naive design from Pass 1.

Doesn't scale for normal users (200 fan-in per refresh × 200M DAU × 7 refreshes/day) — exactly the problem we just rejected.

The fix is a hybrid model: push for normal users, pull for celebrities. When a regular user with 1,000 followers tweets, fan it out to all 1,000 timeline caches — fast, cheap. When Elon tweets, mark it as a "celebrity tweet" and skip fan-out entirely. At timeline-read time, the reader pulls Elon's recent tweets live and merges them with the pre-computed cache. This keeps fan-out work bounded (no user fans out to more than ~100K followers) and adds at most a handful of "live celebrity pull" queries per timeline read.

The threshold: users with more than ~10K followers are flagged as celebrities. About 0.1% of all users. Their tweets skip fan-out and are pulled in at read time. Everyone else uses pure push. This split absorbs the worst of both worlds and keeps the system uniform for the 99.9%.

Pass 3 — The production shape

Now the full picture. Every node is numbered — find its matching card below to see what it does and what would break without it. Three planes: write (orange), read (blue), and async (yellow).

flowchart TB CL["① Client — Mobile / Web"] LB["② Load Balancer"] subgraph WRITE["Write Plane"] WS["③ Tweet Write Service"] TDB[("④ Tweet Storage — Cassandra")] MEDIA[("⑤ Media Storage — S3 + CDN")] end subgraph ASYNC["Async Plane"] FQ["⑥ Fan-out Service — Kafka + Workers"] TC[("⑦ Timeline Cache — Redis")] end subgraph READ["Read Plane"] TR["⑧ Timeline Read Service"] UG["⑨ User Graph Service — MySQL"] SR["⑩ Search Service — Elasticsearch"] TT["⑪ Trending Topics Service"] end MON["⑫ Analytics & Monitoring"] CL --> LB LB -->|"POST tweet"| WS LB -->|"GET timeline"| TR LB -->|"GET search"| SR LB -->|"upload media"| MEDIA WS --> TDB WS --> FQ FQ --> TC FQ --> SR FQ --> TT TR --> TC TR --> TDB TR --> UG TR --> MEDIA WS -.events.-> MON TR -.events.-> MON style CL fill:#e8743b,stroke:#e8743b,color:#fff style LB fill:#171d27,stroke:#9b72cf,color:#d4dae5 style WS fill:#171d27,stroke:#e8743b,color:#d4dae5 style TDB fill:#171d27,stroke:#e8743b,color:#d4dae5 style MEDIA fill:#171d27,stroke:#e8743b,color:#d4dae5 style FQ fill:#171d27,stroke:#d4a838,color:#d4dae5 style TC fill:#171d27,stroke:#d4a838,color:#d4dae5 style TR fill:#171d27,stroke:#4a90d9,color:#d4dae5 style UG fill:#171d27,stroke:#4a90d9,color:#d4dae5 style SR fill:#171d27,stroke:#4a90d9,color:#d4dae5 style TT fill:#171d27,stroke:#4a90d9,color:#d4dae5 style MON fill:#171d27,stroke:#3cbfbf,color:#d4dae5

Component-by-component — what each numbered box does

Use the numbers in the diagram above to find the matching card. Each one answers what is this, why is it here, and what would break without it.

Client

The Twitter mobile app or twitter.com in a browser. Sends three primary requests: POST a tweet (rare), GET the home timeline (constant), and GET search results (occasional). Also opens long-lived connections for push notifications. From the client's perspective, the entire backend is one URL — everything else is invisible.

Solves: nothing on its own — but every latency and freshness budget downstream comes from "what does the client experience?" 200ms timeline budget, instant tweet appearance in the author's own timeline, the eventual-consistency window — all client-driven constraints.

Load Balancer

The traffic cop. Sits in front of every service tier, distributes incoming HTTPS, terminates TLS, and yanks unhealthy backends out of rotation via 5-second health checks. Round-robin for the write tier (low QPS, requests are uniform); least-connections for the read tier (some timelines take much longer to assemble than others).

Solves: single-server bottlenecks and single-server failures. Without the LB, one app crash takes down the whole product. With it, we lose 1/N of capacity for a few seconds until health checks fail the bad pod out.

Tweet Write Service

Stateless service that handles POST /api/v1/tweet. Per request: validate input → generate a unique tweet_id (timestamp-embedded, see §7) → write the row to Tweet Storage ④ → publish a NewTweet event to the Fan-out Service ⑥ → return the new tweet_id. Total budget: under 100ms. Scaling is trivial because it's stateless — add pods behind the LB.

Solves: isolating write traffic from read traffic. Without a dedicated write tier, the 1,150 writes/sec would compete for CPU and connections with the 325K reads/sec — and a write spike would slow every timeline read. Splitting them lets each tier be sized for its own workload.

Tweet Storage (Cassandra)

The source of truth for every tweet ever written. A wide-column NoSQL store sharded by tweet_id (which embeds a timestamp — see §7), replicated 3× across availability zones with quorum reads (R=2, W=2, N=3). Holds 100M new rows per day; ~365 billion rows over 10 years. Each row stores the tweet text, author user_id, creation timestamp, location, and media references.

Solves: durable, horizontally-scalable storage for the 30 GB/day of text plus billions of historical rows. A single MySQL box maxes out around 1 TB of healthy operation; we need orders of magnitude more. Cassandra spreads it across hundreds of nodes and survives node failures automatically.

Media Storage (S3 + CDN)

Photos and videos live in object storage (S3 or equivalent), fronted by a global CDN (CloudFront, Akamai). Clients upload directly via pre-signed URLs — bytes never traverse our app servers. On timeline render, the app returns CDN URLs the client fetches itself. The CDN caches videos at 200+ edge POPs worldwide, so a Tokyo viewer streams from a Tokyo node, not from US-East.

Solves: the 35 GB/sec egress problem. Without a CDN, all that video would have to flow out of our origin data centers — saturating links, spiking costs, and adding 200ms of trans-Pacific latency for every Asian viewer. With a CDN, origin sees maybe 5% of egress; the rest is served from the edge.

Fan-out Service (Kafka + Workers)

The asynchronous heart of the design. When the Write Service ③ publishes a NewTweet event, a Kafka topic absorbs it. Worker pods consume the topic and do three things in parallel: (1) look up the author's followers in the User Graph ⑨, (2) for each follower, push the new tweet_id to that follower's Timeline Cache ⑦ entry, (3) index the tweet text in the Search Service ⑩ and update Trending Topics ⑪ counters. For celebrity authors (≥10K followers), step 2 is skipped — read-time pull handles them.

Solves: hiding the cost of fan-out from the user posting the tweet. Sarah hits "Tweet" and gets a 201 in 60ms. The actual delivery to her 1,000 followers' timelines happens in the background over the next ~2 seconds. Without this async layer, every tweet POST would block until all follower writes completed — a tweet from a user with 50K followers would take 5+ seconds to post.

What if the queue backs up? Kafka buffers it; lag becomes visible delivery delay. We monitor consumer lag and auto-scale workers. Worst case during a viral spike: a tweet appears in followers' timelines 30 seconds late instead of 2. Users barely notice; the queue catches up within minutes.

Timeline Cache (Redis)

The core data structure of the read path. Per active user, Redis stores a list of the most recent ~800 tweet_ids that should appear in their home timeline — already merged, already sorted, already filtered. Just tweet_ids, not full tweets (kept small: ~10 KB per user). When a user opens the app, the read service fetches this list, hydrates the top 20 tweet_ids into full tweet objects, and returns. 5ms read.

Solves: the 325K reads/sec problem. Without pre-computed timelines, every read would do the multi-partition fan-in query that killed Pass 1. With it, reads are a single Redis GET. The DB is sized for the cold path (cache misses, deep pagination), not the hot peak.

Sized for active users, not all users. 200M DAU × 10 KB ≈ 2 TB — fits in a sharded Redis cluster of 30-40 nodes. Inactive users (the other 800M) get their timeline rebuilt lazily on next login.

Timeline Read Service

Stateless service that handles GET /api/v1/timeline. Per request: fetch the user's pre-computed timeline list from Cache ⑦ → if author is a celebrity, also pull the celebrity's recent tweets live from Tweet Storage ④ and merge → hydrate tweet_ids into full tweet objects (text, media URLs, counters) → return. Latency budget: under 200ms. Scaled aggressively — read tier is the largest deployment.

Solves: assembling the final timeline payload from its parts (cache + celebrity pull + media URLs). This is where the hybrid push/pull model is implemented: cached tweet_ids handle the 99.9% of regular-user content, the celebrity pull handles the long-tail of high-follower accounts.

User Graph Service (MySQL)

Owns the follow graph: who follows whom. Backed by sharded MySQL (consistent — when you tap "Follow", it has to take effect immediately for you). Two key queries: "give me user X's followers" (used by Fan-out ⑥ on every tweet) and "give me user X's follows" (used to compute timelines and recommendations). Heavily cached because the graph changes slowly relative to how often it's read.

Solves: being the source of truth for the social graph. Without a dedicated graph service, fan-out would have to hit a generic user table and the storage tier would be flooded with "list followers of X" queries during every fan-out cycle. A specialized service with graph-shaped indexes and aggressive caching handles it cleanly.

Search Service (Elasticsearch)

Full-text search over every tweet ever written. Fan-out ⑥ writes each new tweet's text into an Elasticsearch index in near-real-time (a few seconds of lag). Search queries hit Elasticsearch directly: tokenized, ranked, paginated. Sharded the same way Cassandra is — by tweet_id with timestamp embedded, so "search worldcup, recent first" is a partition-pruned scan, not a full-cluster broadcast.

Solves: the "search tweets" feature, which is its own product surface. We can't search a Cassandra wide-column store by text content — Cassandra is built for key lookups, not full-text scans. Elasticsearch is purpose-built for inverted-index text search and lives alongside, fed asynchronously.

Trending Topics Service

Counts hashtag occurrences over a sliding window (typically the last 5 minutes to 1 hour) and exposes the top-K via a min-heap kept in memory. Fan-out ⑥ extracts hashtags from each tweet and increments per-tag counters; a background process refreshes the trending list every minute. Personalized variants take location and follow-graph into account.

Solves: the "What's happening" panel. Without a dedicated trending service, computing the top hashtags would require scanning every tweet from the last hour on every refresh — a multi-billion-row scan that no DB can serve at panel-refresh latency.

Analytics & Monitoring

Captures everything: tweets/sec, p99 timeline latency, fan-out lag (how far behind real-time is the worker queue?), cache hit rate, DB replication lag, error rates per endpoint. Pipelined via Kafka into Prometheus + Grafana for live ops, and into a data warehouse for product analytics. Alerts page the on-call engineer when fan-out lag crosses 30 seconds or timeline p99 crosses 300ms.

Solves: knowing the system is healthy. Twitter-scale systems fail in subtle ways — a single shard going slow, a Kafka partition skewing, a worker pool deadlocking — and without instrumentation you don't notice until users complain on Twitter (which is awkward when you are Twitter). Monitoring closes that loop.

Concrete walkthrough — Sarah tweets, Raj scrolls

Two real flows, mapped to the numbered components above:

✍️ Write flow — Sarah (1,000 followers) tweets at 14:02:06

  1. Sarah's app ① uploads a photo to Media Storage ⑤ via pre-signed URL → gets back img_92af.
  2. App POSTs { text, media_ids: [img_92af] } through the Load Balancer ② to a Tweet Write Service ③ pod.
  3. Write service generates tweet_id = 1648773726000_4271 (epoch + sequence), inserts the row into Tweet Storage ④ — sharded by tweet_id, replicated 3×.
  4. Write service publishes a NewTweet event to Kafka and returns 201 to Sarah's app. Total elapsed: 60ms. Sarah sees her tweet appear instantly.
  5. In the background, Fan-out workers ⑥ consume the event: look up Sarah's 1,000 followers in User Graph ⑨, push the tweet_id into each of those 1,000 entries in Timeline Cache ⑦. Total fan-out time: ~2 seconds.
  6. In parallel, the worker indexes the tweet text in Search ⑩ and increments any hashtag counters in Trending ⑪.

📖 Read flow — Raj opens the app at 14:03:14

  1. Raj's app ① GETs /api/v1/timeline?user_id=raj. LB ② routes to a Timeline Read Service ⑧ pod.
  2. Read service fetches Raj's pre-computed timeline list from Timeline Cache ⑦ — gets back ~800 recent tweet_ids, including Sarah's from 68 seconds ago.
  3. Read service checks: does Raj follow any celebrities? Yes — Elon. So it pulls Elon's last 20 tweets live from Tweet Storage ④ and merges them with the cached list, sorted by time.
  4. Read service hydrates the top 20 tweet_ids → fetches full tweet rows from Tweet Storage ④ (or its read-cache), resolves Media ⑤ URLs to CDN paths.
  5. Returns the assembled timeline JSON to Raj's app. Total elapsed: ~80ms. Sarah's tweet renders 4th in his feed; Elon's latest renders 1st.
So what: the architecture is built around three insights — (1) reads outnumber writes 280-to-1, so we pre-compute timelines at write time and reads become a single cache fetch; (2) fan-out has a long tail (Elon's 100M followers), so we split into push-for-normal-users + pull-for-celebrities, bounding worker cost; (3) media is its own beast, so it lives in object storage + CDN, never traveling our application path. Every box in the diagram exists to remove one of those failure modes from Pass 1.

Sequence — tweet + fan-out

sequenceDiagram actor Sarah participant App as Sarah App participant W as Write Service participant T as Tweet Storage participant K as Kafka participant F as Fan-out Worker participant G as User Graph participant TC as Timeline Cache Sarah->>App: Tap Tweet App->>W: POST /tweet W->>T: INSERT row T-->>W: OK W->>K: publish NewTweet event W-->>App: 201 Created App-->>Sarah: Tweet appears Note over K,F: Async — does not block the user K->>F: deliver event F->>G: get followers of Sarah G-->>F: 1000 follower ids loop for each follower F->>TC: LPUSH tweet_id to follower timeline end

Sequence — timeline read with celebrity pull

sequenceDiagram actor Raj participant App as Raj App participant R as Timeline Read Service participant TC as Timeline Cache participant T as Tweet Storage participant G as User Graph Raj->>App: Open app App->>R: GET /timeline R->>TC: get Raj timeline list TC-->>R: 800 tweet_ids R->>G: which follows are celebrities? G-->>R: Elon R->>T: fetch Elon recent tweets T-->>R: 20 tweets R->>R: merge cached + celebrity, sort by time R->>T: hydrate top 20 tweet_ids T-->>R: full tweet rows R-->>App: timeline JSON App-->>Raj: render feed
Step 7

Data Sharding

43 PB of tweets doesn't fit on one box, and even if it did we couldn't survive its failure. We must shard Tweet Storage across many nodes. The choice of sharding key determines whether queries are fast or whether they scatter to every shard in the cluster.

❌ Shard by user_id

"All of user X's tweets live on shard hash(user_id) % N." Fast for "show me user X's profile" — single shard. But: Elon's shard holds tens of millions of tweets and gets hammered by every visitor — that's the hot user problem. And timeline queries that touch 200 users still scatter to ~200 shards.

❌ Shard by tweet_id (random)

"shard = hash(tweet_id) % N." Even distribution, no hot shards. But: "show me tweets from user X in the last hour" must broadcast to every shard — there's no way to know which shards hold his tweets. Same for "search recent tweets". Every query is a fan-out.

✅ Tweet_id with embedded timestamp

Build the tweet_id as epoch_seconds (31 bits) + auto_inc_seq (17 bits) = 48-bit unique, time-sortable id. Then shard by tweet_id range. Now "give me tweets after time T" maps to "the last few shards" — partition pruning works at the time dimension, which is what timelines actually need.

The unlock: by encoding time into the primary key, the storage layer "knows" recent tweets cluster together physically. Timeline reads (which always want recent tweets) hit a few hot shards instead of broadcasting; older tweets naturally land on cold shards that can run on cheaper hardware.

The 17-bit sequence part gives 2¹⁷ = 131,072 tweets per second per generator — comfortably above our 1,150/sec global rate even after 100× growth. The 31-bit epoch covers ~68 years of seconds. This Twitter-style id design is famously used in Twitter's own Snowflake service and inspired countless successors.

Step 8

Cache

Two caches, doing different jobs. Both follow the 80/20 rule: 20% of users and 20% of tweets generate 80% of all reads.

📋 Timeline Cache (Redis)

Per active user: a Redis list holding the latest ~800 tweet_ids merged from all the people they follow. Just ids — kept small (~10 KB per user) so we can hold 200M DAU × 10 KB ≈ 2 TB across a sharded Redis cluster of 30-40 nodes. Updated by Fan-out workers ⑥ on every new tweet.

Why 800 tweet_ids? Most users scroll a few pages deep at most. 800 covers ~40 pages of 20 tweets each — well past where any normal session ends. Beyond 800, a deep-scroll falls through to live DB queries (rare).

📦 Tweet Object Cache (Memcached)

Caches hydrated tweet objects (text, author, media URLs, counters) keyed by tweet_id. Hit rate is huge because the same hot tweets appear in many users' timelines — a viral tweet might be hydrated 10M times from one cache entry. LRU eviction; cache sized to hold the last ~3 days of tweets in memory (older tweets fall through to Cassandra on demand).

Per-shard cache

Each cache shard sits in front of its corresponding Cassandra shard, so cache misses for tweet_id X hit only the one Cassandra shard that owns X. This avoids cache stampedes across the cluster and keeps the network blast-radius small when a hot tweet evicts.

The "3-day window" rule: 99% of timeline reads are for tweets posted within the last 3 days. Caching that window in memory means almost every tweet hydration is a cache hit. Tweets older than 3 days (rare reads — searches, profile drill-downs, deep-link clicks) go straight to Cassandra at slightly higher latency. This sizing is empirically chosen — not just an engineering guess.
Step 9

Timeline Generation — Push, Pull, and the Hybrid Lane

The mental model is in §6, Pass 2. Here we make the implementation concrete.

Push path (the 99.9% case)

When user X tweets, the Fan-out worker reads X's follower list from User Graph ⑨, then for each follower performs a Redis LPUSH on that follower's timeline-cache key with the new tweet_id. After the push, an LTRIM 0 799 keeps the list capped at 800 entries — the oldest tweet_id falls off the bottom. Cost per tweet: roughly O(followers) Redis writes. For a user with 1,000 followers, ~1,000 Redis ops; pipelined across worker pods, finishes in ~2 seconds.

Pull path (celebrities)

If the author has more than the celebrity threshold (~10K followers), Fan-out skips the push entirely. The tweet still lands in Tweet Storage ④. At read time, the Timeline Read Service ⑧ checks each follow of the requesting user — if any are celebrities, it issues a quick query: "give me celebrity_X's tweets newer than my last-fetched-cursor" — and merges the result into the cached list before returning.

Why this is the right split

📉 Bounded fan-out cost

No author ever fans out to more than ~10K followers via push. That bounds worker work per tweet, which means tweet-posting latency stays low even when a moderately popular user (~9,999 followers) tweets. If we let everyone push, Elon posting one tweet would back the queue up for minutes.

📈 Bounded read amplification

The average user follows 200 people. Even on a worst-case timeline, only a handful are celebrities. So reads do at most ~5 extra "live celebrity pull" queries — a small fixed cost that doesn't grow with how many people you follow overall.

The threshold isn't fixed forever. It's tunable based on observed worker queue depth. During major events (election night, World Cup final) when fan-out is straining, we can lower the celebrity threshold from 10K to 5K, demoting more authors to pull-mode and offloading work from the queue. After the spike, raise it back.
Step 10

Replication & Fault Tolerance

Twitter being unavailable for 5 minutes is a global news event. Every tier must survive node failures without users noticing.

📦 Cassandra (Tweet Storage)

Built-in replication: R=2, W=2, N=3. Every write must be acknowledged by 2 of 3 replicas before returning OK; reads consult 2 replicas and pick the freshest. Survives a full availability-zone failure with no data loss and no read errors.

🌐 Stateless services

Write Service, Read Service, Search Service all run as stateless pods behind the LB. Failure = LB health-check fails the pod, traffic shifts to peers, autoscaler spins up a replacement. Recovery time: 10-30 seconds.

🔁 Cache

Redis cluster with read-replicas per shard. Loss of a primary triggers automatic failover (~5s). Briefly we serve slightly stale timeline reads — acceptable. Memcached has no built-in replication; we use consistent hashing so a node loss only affects 1/N of cache keys, which refill from Cassandra on next read.

Multi-region for global low-latency reads

Tweet Storage and Timeline Cache are replicated to regional clusters in EU, APAC, and the Americas. A user in Tokyo opens the app → DNS routes them to the APAC region → reads come from the local replica in <50ms instead of 250ms cross-Pacific. Writes flow to a primary region with eventual replication to others (a few seconds of lag is fine — the user posting the tweet sees their own tweet immediately because we update their local timeline cache synchronously).

Step 11

Load Balancing

LBs sit at three layers in our system, and each plays a slightly different role.

① Client → App tier

Public-facing LB (AWS ALB / nginx). Distributes incoming HTTPS across Write, Read, and Search service pods. Health-checks every 5s. Terminates TLS so backend pods don't pay the crypto cost.

② App → Cache

Client-side or sidecar LB. Uses consistent hashing on the cache key (tweet_id or user_id) so the same key always goes to the same cache node — maximizing hit rate and avoiding double-caching across nodes.

③ App → DB

Cassandra and MySQL drivers handle this themselves — clients know the cluster topology and route requests directly to the correct shard's coordinator node, skipping any extra hop.

Algorithm choice

Start with Round Robin for the write tier (uniform requests, no sticky state). Use Least Connections for the read tier — some timeline reads take much longer than others (deep-pagination, many celebrity follows), and least-connections naturally avoids piling more work on a slow pod.

The LB is itself a single point of failure — solved with active-passive pairing using virtual-IP failover, or natively in cloud LBs (ALB is multi-AZ by default).
Step 12

Monitoring

Twitter-scale failures are subtle — a slow shard, a skewed Kafka partition, a worker pool deadlocking. Without instrumentation you find out from users tweeting "Twitter is broken", which is awkward when you are Twitter.

MetricWhat it tells youAlert threshold
Tweets/secWrite traffic — is there a spike?±50% from baseline
Timeline p50 / p99 latencyRead path healthp99 > 300ms
Fan-out lagHow far behind real-time is the worker queue> 30s
Cache hit rateIs the cache actually helping< 90%
DB replication lagMulti-region read freshness> 10s
5xx rate per endpointService-level errors> 0.1%

All metrics flow through Kafka into Prometheus + Grafana for live dashboards, and into a data warehouse for product analytics. PagerDuty wakes the on-call when red-line alerts fire.

Step 13

Extended Features

The core product is post-tweet + timeline. The features below are layered on top using the same primitives.

🔍 Search

Elasticsearch index over every tweet's text. Fan-out ⑥ feeds new tweets in near-real-time. Sharded by tweet_id with timestamp embedded so "search worldcup, recent first" is a partition-pruned scan, not a full-cluster broadcast. See the dedicated Twitter Search page for index design and ranking details.

📈 Trending Topics

Sliding window count of hashtags over the last 5-60 minutes. A min-heap maintains the top-K (K=10 globally, with location-personalized variants). Refreshed every minute — small enough updates to feel real-time, large enough not to flap.

👥 Who to follow

Graph algorithm (friends-of-friends — "people followed by people you follow") combined with ML signals (common location, similar interests inferred from engagement). Pre-computed nightly per user; refreshed on big follow-graph changes.

📚 Moments

Editorially-curated tweet collections around a theme (a sports event, a breaking-news story). Stored as a list of tweet_ids in a "moments" table. ML helps surface candidate tweets; humans do final curation.

🔁 Retweets

Stored as a separate object that references the original tweet_id ({ retweet_id, retweeter_user_id, original_tweet_id, timestamp }). On render, the original tweet is hydrated and shown with a "Retweeted by ..." header. Fans out to retweeter's followers exactly like a normal tweet.

🔔 Notifications & @mentions

Fan-out workers also extract @mentions from each tweet and push notification events to the mentioned users. A separate Notification Service holds per-user notification lists in Redis (similar shape to Timeline Cache) and drives push-notifications via APNs/FCM.

Step 14

Interview Q&A

Push vs pull for timeline — when to use each?
Push (fan-out-on-write) for normal users. Reads dominate writes 280:1, so doing the work once at write time and reading from cache is a massive net win. Pull (fan-in-on-read) for celebrities. A user with 100M followers can't be fan-out-pushed without melting the worker queue and delaying everyone else. The hybrid: push for ≤10K followers, pull for >10K. Reader merges cached pre-built timeline + live celebrity pulls.
How do you handle Elon Musk?
Treat him as a pull-mode author. When Elon tweets, his row goes to Tweet Storage but Fan-out skips writing to 100M follower caches. At read time, any user who follows Elon does a quick "give me Elon's tweets after my last cursor" query and merges results into their cached timeline. Cost: one extra query per timeline read for that user. Compare to fan-out: 100M Redis writes per Elon tweet, which would back the queue up for hours.
How do you achieve under-200ms timeline latency?
Pre-compute, cache, parallelize. (1) Timelines are built at write time by Fan-out workers, so reads are a single Redis fetch (~5ms). (2) Hot tweet objects are cached in Memcached so hydration of the top 20 tweet_ids is also a cache hit. (3) Celebrity pulls happen in parallel with cache hydration. (4) Media URLs are served from CDN, never assembled by app servers. Net: ~80ms p50, ~200ms p99 — comfortably inside the budget.
Why Cassandra over MySQL for tweets?
Access pattern + scale. 100M new tweets/day on an append-heavy workload, queried almost exclusively by primary key (tweet_id) or partition key (user_id), with no joins or transactions across rows. Cassandra is built for exactly this — wide-column, partition-by-key, replicated by default, scales horizontally to hundreds of nodes. MySQL would force vertical scaling, painful resharding, and offers nothing we'd actually use (joins, complex transactions). MySQL still wins for the User Graph (smaller, relational queries, strong consistency on follow).
How do you generate trending topics?
Sliding window + min-heap. Fan-out workers extract hashtags from each new tweet and increment counters in a per-tag bucket keyed by 1-minute time slots (so we can drop old slots cheaply). A background process every minute sweeps the last 5-60 minutes of buckets, sums per tag, and maintains a min-heap of the top-K. Personalized variants additionally weight by user location and follow-graph signals.
What happens when the Fan-out service is backed up?
Degrade gracefully — don't drop tweets. Kafka buffers the queue; lag becomes visible delivery delay. We monitor consumer lag and auto-scale workers up to a ceiling. During a true overload (election night, World Cup), we temporarily lower the celebrity threshold from 10K to 5K, demoting more authors to pull-mode and shrinking the per-tweet fan-out cost. Worst case during a spike: tweets appear in followers' timelines 30 seconds late instead of 2 — barely perceptible. The author always sees their own tweet immediately because we update their own timeline cache synchronously in the Write Service.
How do you keep the same tweet_id from being generated twice?
Snowflake-style IDs. Each tweet_id is a 48-bit value: 31 bits of epoch seconds + 17 bits of per-second sequence, with the sequence generator sharded by node-id. Two different generator nodes can never collide because their node-id occupies the high bits of the sequence. Within a node, the sequence counter resets every second and is monotonically incremented — so within a second we can issue 2¹⁷ = 131K unique ids comfortably above our 1,150/sec rate. Twitter's open-source Snowflake service codifies exactly this.
What's the trade-off of "eventually consistent" timelines?
A few seconds of staleness in exchange for ~10x the read throughput. When Sarah tweets, her followers' timelines update over the next 1-3 seconds via Fan-out — not synchronously. So if Raj refreshes his app within that window, he might briefly miss her tweet. In exchange, every timeline read is a cache hit instead of a multi-shard fan-in query. Users overwhelmingly prefer "fast and slightly stale" over "perfectly fresh and slow" — Twitter consistently chooses the former, and so should we.