From "one DB melting under 325K reads/sec" to a fan-out-on-write timeline with a special lane for Elon — the architecture that earns every box
This deep-dive applies the 4-step HLD interview framework. As you read, map each section to Requirements → Entities → APIs → High-Level Design → Deep Dives, and notice which of the 8 common patterns and key technologies are at play.
Imagine it's 9:37am. Elon Musk fires off a 140-character tweet about a new Tesla feature. He has roughly 100 million followers. Within seconds, every one of those 100M people who opens the Twitter app should see that tweet near the top of their home timeline — interleaved with tweets from everyone else they follow, sorted by recency, with photos and videos rendering inline. At the same moment Elon is tweeting, Sarah in Bangalore is scrolling her own timeline (she follows 200 people, not 100M), Raj in São Paulo is searching for "World Cup", and 200 million other daily users are doing some mix of all of this. That choreography — happening 24/7 across the globe — is the system we're designing.
Twitter is, at its heart, a microblogging service: short messages called tweets (originally capped at 140 characters, now longer), a directional follow graph (you follow me; that doesn't mean I follow you back), and a timeline — the chronological stream of tweets from the people you follow. Add photos, videos, replies, retweets, hashtags, search, and trending topics, and you have the product.
Pin down the product surface before drawing boxes. In an interview, asking these questions out loud signals you're not just regurgitating an architecture — you're deciding what to build first.
Numbers drive every architectural choice. Twitter is famously read-heavy: people read far more than they post. Let's get rough quantities so the rest of the design has weight behind it.
Assume 1 billion total users with 200 million daily active users (DAU). 100 million new tweets per day. Each user follows on average 200 other users.
~1,150/sec
100M / 86400
~325K/sec
200M DAU × 7 page-views × 20 tweets / 86400
~12K/sec
1B favorites/day spread out
~200
Drives fan-out cost
Each tweet ≈ 280 bytes of text + 30 bytes of metadata (id, user_id, timestamps, counters). That gives 100M × 310B ≈ ~30 GB/day text. Add photos and videos:
~30 GB/day
Tiny relative to media
~1 TB/day
Lives in object storage + CDN
~22 TB/day
Bulk of all storage
Total storage growth: ~24 TB/day dominated by video. Over 5 years: ~43 PB.
Outgoing bandwidth (people watching videos and viewing photos in their timeline) dwarfs ingest. Roughly ~35 GB/sec egress, mostly video served via CDN. Text is rounding error — 35 KB/sec wouldn't move the needle.
| Metric | Value | Why it matters |
|---|---|---|
| Tweets/sec | 1,150/s | Drives write throughput & fan-out worker pool |
| Timeline reads/sec | 325K/s | Forces pre-computed timelines and aggressive caching |
| Avg follows/user | 200 | Each tweet fans out to 200 timeline caches on average |
| Storage/day | 24 TB | Forces sharded storage; video → object store + CDN |
| Egress | 35 GB/s | Video must be CDN-served — origin can't carry it |
325K reads / 1,150 writes ≈ 280:1. Reads dominate writes by almost three orders of magnitude. Any design that does work on the read path that could have been pre-computed at write time will lose at scale.Five endpoints carry essentially all the traffic. Defining them up front locks the contract before architecture, and reveals which calls are write-path (rare) vs. read-path (constant).
REST API surface// 1. Post a tweet — write path, low QPS POST /api/v1/tweet { "api_key": "abc123...", "text": "Just landed in SF!", "location": { "lat": 37.77, "lon": -122.41 }, // optional "media_ids": ["img_92af", "vid_71be"] // optional, pre-uploaded } → 201 Created { "tweet_id": "1648...3922", "created_at": "2026-05-07T14:02:06Z" } // 2. Fetch home timeline — read path, very high QPS GET /api/v1/timeline?user_id=42&count=20&page_token=eyJ0Ijox... → 200 OK { "tweets": [ { "tweet_id": "...", "user": {...}, "text": "...", "media": [...] }, ... ], "next_page_token": "eyJ0Ijox..." } // 3. Follow another user POST /api/v1/follow { "user_id": 42, "target_user_id": 99 } → 204 No Content // 4. Favorite a tweet POST /api/v1/favorite { "user_id": 42, "tweet_id": "1648...3922" } → 204 No Content // 5. Search tweets GET /api/v1/search?api_key=...&query=worldcup&count=20&sort=recent → 200 OK { "tweets": [...], "next_page_token": "..." }
page_token and not page=N: tweets are constantly being added at the top. If we used numeric pages, "page 2" would shift down by however many new tweets appeared since "page 1" — users would see duplicates. A token encodes "give me items older than tweet X", which is stable as new tweets arrive on top.media_id, and only sends the small media_id in the tweet body. This keeps tweet writes fast and lets media uploads stream directly to storage without our app servers in the path.Four core entities. Two patterns matter: tweets are written constantly and queried by id or by user_id; the follow graph is queried as "who does user X follow?" and "who follows user X?". This shape — heavy writes on one table, fan-out reads on another — is what pushes us to a hybrid SQL + NoSQL stack.
1,150 writes/sec sustained, 100M+ rows/day, queried almost exclusively by primary key (tweet_id) or by partition (user_id). No joins, no transactions across tweets. Cassandra's wide-column model is purpose-built for this — append-heavy, partition-by-key, replicated by default.
Lower write rate, strong consistency matters (a "follow" must be visible immediately to that user), and we want relational queries ("count my followers", "is user X following user Y"). MySQL with read replicas handles 10K writes/sec easily; the graph fits in a few hundred GB even at 1B users.
The num_favorites column on TWEET is a denormalized counter — the source of truth lives in FAVORITE rows, but counting them on every render would be too slow. We update the counter asynchronously (more in §12).
This is the section that wins or loses the interview. We'll build the architecture in three passes: the simplest thing that could plausibly work, why it falls apart, and the production shape where every box justifies itself. Numbers from §3 drive every decision.
Sketch the simplest possible system: one app server talks to one MySQL database. To post a tweet, insert a row. To show Sarah's timeline, run a query like: "SELECT tweets FROM tweet WHERE user_id IN (the 200 people Sarah follows) ORDER BY creation_date DESC LIMIT 20". Done.
Three concrete failures emerge the moment real traffic shows up:
Sarah follows 200 people. To build her timeline, the DB must touch 200 user-partitions, sort their recent tweets by time, and return the top 20. Now imagine Raj follows 5,000 people — his query touches 5,000 partitions for a single page-view. With 200M DAU each refreshing 7 times a day, that's billions of multi-partition fan-in queries per day. No DB survives this.
A single MySQL instance handles roughly 5K-10K simple queries/sec. We need 325,000. Even with read replicas, the timeline query above hits dozens of partitions per request — multiplying the underlying DB load by 10×-50×. CPU pegs at 100%, p99 latency goes from 5ms to 5 seconds, every Twitter user's app feels broken.
1,150 tweet inserts/sec sounds tiny next to 325K reads — until you realize they share the same DB. Write contention on indexes (especially the timestamp index every timeline query depends on) makes both sides slower. A single celebrity tweet can lock indexes for milliseconds and ripple latency across thousands of in-flight reads.
The single most important insight in this design comes from the read/write ratio in §3. Timelines are read 280× more often than tweets are written. So instead of computing the timeline at read time (expensive, repeated), compute it once at write time and cache it. When Sarah posts a tweet, push it into the pre-built timelines of every one of her followers. When Raj opens the app, his timeline is already assembled — read it from a single cache key in 5ms.
On tweet, look up all of the author's followers and write the new tweet_id into each follower's timeline cache. Read becomes O(1): grab the cached list of tweet_ids, hydrate them, return. This is the classic "do work at write time so reads are free" trade.
Breaks for celebrities. Elon has 100M followers. One tweet from him would trigger 100 million cache writes in a few seconds — overwhelming the fan-out workers and causing huge latency spikes for everyone else's tweets behind his in the queue.
Don't fan out anything at write time. At read time, query each followed user's recent tweets and merge. Cheap writes, expensive reads. This is the naive design from Pass 1.
Doesn't scale for normal users (200 fan-in per refresh × 200M DAU × 7 refreshes/day) — exactly the problem we just rejected.
The fix is a hybrid model: push for normal users, pull for celebrities. When a regular user with 1,000 followers tweets, fan it out to all 1,000 timeline caches — fast, cheap. When Elon tweets, mark it as a "celebrity tweet" and skip fan-out entirely. At timeline-read time, the reader pulls Elon's recent tweets live and merges them with the pre-computed cache. This keeps fan-out work bounded (no user fans out to more than ~100K followers) and adds at most a handful of "live celebrity pull" queries per timeline read.
Now the full picture. Every node is numbered — find its matching card below to see what it does and what would break without it. Three planes: write (orange), read (blue), and async (yellow).
Use the numbers in the diagram above to find the matching card. Each one answers what is this, why is it here, and what would break without it.
The Twitter mobile app or twitter.com in a browser. Sends three primary requests: POST a tweet (rare), GET the home timeline (constant), and GET search results (occasional). Also opens long-lived connections for push notifications. From the client's perspective, the entire backend is one URL — everything else is invisible.
Solves: nothing on its own — but every latency and freshness budget downstream comes from "what does the client experience?" 200ms timeline budget, instant tweet appearance in the author's own timeline, the eventual-consistency window — all client-driven constraints.
The traffic cop. Sits in front of every service tier, distributes incoming HTTPS, terminates TLS, and yanks unhealthy backends out of rotation via 5-second health checks. Round-robin for the write tier (low QPS, requests are uniform); least-connections for the read tier (some timelines take much longer to assemble than others).
Solves: single-server bottlenecks and single-server failures. Without the LB, one app crash takes down the whole product. With it, we lose 1/N of capacity for a few seconds until health checks fail the bad pod out.
Stateless service that handles POST /api/v1/tweet. Per request: validate input → generate a unique tweet_id (timestamp-embedded, see §7) → write the row to Tweet Storage ④ → publish a NewTweet event to the Fan-out Service ⑥ → return the new tweet_id. Total budget: under 100ms. Scaling is trivial because it's stateless — add pods behind the LB.
Solves: isolating write traffic from read traffic. Without a dedicated write tier, the 1,150 writes/sec would compete for CPU and connections with the 325K reads/sec — and a write spike would slow every timeline read. Splitting them lets each tier be sized for its own workload.
The source of truth for every tweet ever written. A wide-column NoSQL store sharded by tweet_id (which embeds a timestamp — see §7), replicated 3× across availability zones with quorum reads (R=2, W=2, N=3). Holds 100M new rows per day; ~365 billion rows over 10 years. Each row stores the tweet text, author user_id, creation timestamp, location, and media references.
Solves: durable, horizontally-scalable storage for the 30 GB/day of text plus billions of historical rows. A single MySQL box maxes out around 1 TB of healthy operation; we need orders of magnitude more. Cassandra spreads it across hundreds of nodes and survives node failures automatically.
Photos and videos live in object storage (S3 or equivalent), fronted by a global CDN (CloudFront, Akamai). Clients upload directly via pre-signed URLs — bytes never traverse our app servers. On timeline render, the app returns CDN URLs the client fetches itself. The CDN caches videos at 200+ edge POPs worldwide, so a Tokyo viewer streams from a Tokyo node, not from US-East.
Solves: the 35 GB/sec egress problem. Without a CDN, all that video would have to flow out of our origin data centers — saturating links, spiking costs, and adding 200ms of trans-Pacific latency for every Asian viewer. With a CDN, origin sees maybe 5% of egress; the rest is served from the edge.
The asynchronous heart of the design. When the Write Service ③ publishes a NewTweet event, a Kafka topic absorbs it. Worker pods consume the topic and do three things in parallel: (1) look up the author's followers in the User Graph ⑨, (2) for each follower, push the new tweet_id to that follower's Timeline Cache ⑦ entry, (3) index the tweet text in the Search Service ⑩ and update Trending Topics ⑪ counters. For celebrity authors (≥10K followers), step 2 is skipped — read-time pull handles them.
Solves: hiding the cost of fan-out from the user posting the tweet. Sarah hits "Tweet" and gets a 201 in 60ms. The actual delivery to her 1,000 followers' timelines happens in the background over the next ~2 seconds. Without this async layer, every tweet POST would block until all follower writes completed — a tweet from a user with 50K followers would take 5+ seconds to post.
What if the queue backs up? Kafka buffers it; lag becomes visible delivery delay. We monitor consumer lag and auto-scale workers. Worst case during a viral spike: a tweet appears in followers' timelines 30 seconds late instead of 2. Users barely notice; the queue catches up within minutes.
The core data structure of the read path. Per active user, Redis stores a list of the most recent ~800 tweet_ids that should appear in their home timeline — already merged, already sorted, already filtered. Just tweet_ids, not full tweets (kept small: ~10 KB per user). When a user opens the app, the read service fetches this list, hydrates the top 20 tweet_ids into full tweet objects, and returns. 5ms read.
Solves: the 325K reads/sec problem. Without pre-computed timelines, every read would do the multi-partition fan-in query that killed Pass 1. With it, reads are a single Redis GET. The DB is sized for the cold path (cache misses, deep pagination), not the hot peak.
Sized for active users, not all users. 200M DAU × 10 KB ≈ 2 TB — fits in a sharded Redis cluster of 30-40 nodes. Inactive users (the other 800M) get their timeline rebuilt lazily on next login.
Stateless service that handles GET /api/v1/timeline. Per request: fetch the user's pre-computed timeline list from Cache ⑦ → if author is a celebrity, also pull the celebrity's recent tweets live from Tweet Storage ④ and merge → hydrate tweet_ids into full tweet objects (text, media URLs, counters) → return. Latency budget: under 200ms. Scaled aggressively — read tier is the largest deployment.
Solves: assembling the final timeline payload from its parts (cache + celebrity pull + media URLs). This is where the hybrid push/pull model is implemented: cached tweet_ids handle the 99.9% of regular-user content, the celebrity pull handles the long-tail of high-follower accounts.
Owns the follow graph: who follows whom. Backed by sharded MySQL (consistent — when you tap "Follow", it has to take effect immediately for you). Two key queries: "give me user X's followers" (used by Fan-out ⑥ on every tweet) and "give me user X's follows" (used to compute timelines and recommendations). Heavily cached because the graph changes slowly relative to how often it's read.
Solves: being the source of truth for the social graph. Without a dedicated graph service, fan-out would have to hit a generic user table and the storage tier would be flooded with "list followers of X" queries during every fan-out cycle. A specialized service with graph-shaped indexes and aggressive caching handles it cleanly.
Full-text search over every tweet ever written. Fan-out ⑥ writes each new tweet's text into an Elasticsearch index in near-real-time (a few seconds of lag). Search queries hit Elasticsearch directly: tokenized, ranked, paginated. Sharded the same way Cassandra is — by tweet_id with timestamp embedded, so "search worldcup, recent first" is a partition-pruned scan, not a full-cluster broadcast.
Solves: the "search tweets" feature, which is its own product surface. We can't search a Cassandra wide-column store by text content — Cassandra is built for key lookups, not full-text scans. Elasticsearch is purpose-built for inverted-index text search and lives alongside, fed asynchronously.
Counts hashtag occurrences over a sliding window (typically the last 5 minutes to 1 hour) and exposes the top-K via a min-heap kept in memory. Fan-out ⑥ extracts hashtags from each tweet and increments per-tag counters; a background process refreshes the trending list every minute. Personalized variants take location and follow-graph into account.
Solves: the "What's happening" panel. Without a dedicated trending service, computing the top hashtags would require scanning every tweet from the last hour on every refresh — a multi-billion-row scan that no DB can serve at panel-refresh latency.
Captures everything: tweets/sec, p99 timeline latency, fan-out lag (how far behind real-time is the worker queue?), cache hit rate, DB replication lag, error rates per endpoint. Pipelined via Kafka into Prometheus + Grafana for live ops, and into a data warehouse for product analytics. Alerts page the on-call engineer when fan-out lag crosses 30 seconds or timeline p99 crosses 300ms.
Solves: knowing the system is healthy. Twitter-scale systems fail in subtle ways — a single shard going slow, a Kafka partition skewing, a worker pool deadlocking — and without instrumentation you don't notice until users complain on Twitter (which is awkward when you are Twitter). Monitoring closes that loop.
Two real flows, mapped to the numbered components above:
img_92af.{ text, media_ids: [img_92af] } through the Load Balancer ② to a Tweet Write Service ③ pod.tweet_id = 1648773726000_4271 (epoch + sequence), inserts the row into Tweet Storage ④ — sharded by tweet_id, replicated 3×.NewTweet event to Kafka and returns 201 to Sarah's app. Total elapsed: 60ms. Sarah sees her tweet appear instantly./api/v1/timeline?user_id=raj. LB ② routes to a Timeline Read Service ⑧ pod.43 PB of tweets doesn't fit on one box, and even if it did we couldn't survive its failure. We must shard Tweet Storage across many nodes. The choice of sharding key determines whether queries are fast or whether they scatter to every shard in the cluster.
"All of user X's tweets live on shard hash(user_id) % N." Fast for "show me user X's profile" — single shard. But: Elon's shard holds tens of millions of tweets and gets hammered by every visitor — that's the hot user problem. And timeline queries that touch 200 users still scatter to ~200 shards.
"shard = hash(tweet_id) % N." Even distribution, no hot shards. But: "show me tweets from user X in the last hour" must broadcast to every shard — there's no way to know which shards hold his tweets. Same for "search recent tweets". Every query is a fan-out.
Build the tweet_id as epoch_seconds (31 bits) + auto_inc_seq (17 bits) = 48-bit unique, time-sortable id. Then shard by tweet_id range. Now "give me tweets after time T" maps to "the last few shards" — partition pruning works at the time dimension, which is what timelines actually need.
The 17-bit sequence part gives 2¹⁷ = 131,072 tweets per second per generator — comfortably above our 1,150/sec global rate even after 100× growth. The 31-bit epoch covers ~68 years of seconds. This Twitter-style id design is famously used in Twitter's own Snowflake service and inspired countless successors.
Two caches, doing different jobs. Both follow the 80/20 rule: 20% of users and 20% of tweets generate 80% of all reads.
Per active user: a Redis list holding the latest ~800 tweet_ids merged from all the people they follow. Just ids — kept small (~10 KB per user) so we can hold 200M DAU × 10 KB ≈ 2 TB across a sharded Redis cluster of 30-40 nodes. Updated by Fan-out workers ⑥ on every new tweet.
Why 800 tweet_ids? Most users scroll a few pages deep at most. 800 covers ~40 pages of 20 tweets each — well past where any normal session ends. Beyond 800, a deep-scroll falls through to live DB queries (rare).
Caches hydrated tweet objects (text, author, media URLs, counters) keyed by tweet_id. Hit rate is huge because the same hot tweets appear in many users' timelines — a viral tweet might be hydrated 10M times from one cache entry. LRU eviction; cache sized to hold the last ~3 days of tweets in memory (older tweets fall through to Cassandra on demand).
Each cache shard sits in front of its corresponding Cassandra shard, so cache misses for tweet_id X hit only the one Cassandra shard that owns X. This avoids cache stampedes across the cluster and keeps the network blast-radius small when a hot tweet evicts.
The mental model is in §6, Pass 2. Here we make the implementation concrete.
When user X tweets, the Fan-out worker reads X's follower list from User Graph ⑨, then for each follower performs a Redis LPUSH on that follower's timeline-cache key with the new tweet_id. After the push, an LTRIM 0 799 keeps the list capped at 800 entries — the oldest tweet_id falls off the bottom. Cost per tweet: roughly O(followers) Redis writes. For a user with 1,000 followers, ~1,000 Redis ops; pipelined across worker pods, finishes in ~2 seconds.
If the author has more than the celebrity threshold (~10K followers), Fan-out skips the push entirely. The tweet still lands in Tweet Storage ④. At read time, the Timeline Read Service ⑧ checks each follow of the requesting user — if any are celebrities, it issues a quick query: "give me celebrity_X's tweets newer than my last-fetched-cursor" — and merges the result into the cached list before returning.
No author ever fans out to more than ~10K followers via push. That bounds worker work per tweet, which means tweet-posting latency stays low even when a moderately popular user (~9,999 followers) tweets. If we let everyone push, Elon posting one tweet would back the queue up for minutes.
The average user follows 200 people. Even on a worst-case timeline, only a handful are celebrities. So reads do at most ~5 extra "live celebrity pull" queries — a small fixed cost that doesn't grow with how many people you follow overall.
Twitter being unavailable for 5 minutes is a global news event. Every tier must survive node failures without users noticing.
Built-in replication: R=2, W=2, N=3. Every write must be acknowledged by 2 of 3 replicas before returning OK; reads consult 2 replicas and pick the freshest. Survives a full availability-zone failure with no data loss and no read errors.
Write Service, Read Service, Search Service all run as stateless pods behind the LB. Failure = LB health-check fails the pod, traffic shifts to peers, autoscaler spins up a replacement. Recovery time: 10-30 seconds.
Redis cluster with read-replicas per shard. Loss of a primary triggers automatic failover (~5s). Briefly we serve slightly stale timeline reads — acceptable. Memcached has no built-in replication; we use consistent hashing so a node loss only affects 1/N of cache keys, which refill from Cassandra on next read.
Tweet Storage and Timeline Cache are replicated to regional clusters in EU, APAC, and the Americas. A user in Tokyo opens the app → DNS routes them to the APAC region → reads come from the local replica in <50ms instead of 250ms cross-Pacific. Writes flow to a primary region with eventual replication to others (a few seconds of lag is fine — the user posting the tweet sees their own tweet immediately because we update their local timeline cache synchronously).
LBs sit at three layers in our system, and each plays a slightly different role.
Public-facing LB (AWS ALB / nginx). Distributes incoming HTTPS across Write, Read, and Search service pods. Health-checks every 5s. Terminates TLS so backend pods don't pay the crypto cost.
Client-side or sidecar LB. Uses consistent hashing on the cache key (tweet_id or user_id) so the same key always goes to the same cache node — maximizing hit rate and avoiding double-caching across nodes.
Cassandra and MySQL drivers handle this themselves — clients know the cluster topology and route requests directly to the correct shard's coordinator node, skipping any extra hop.
Start with Round Robin for the write tier (uniform requests, no sticky state). Use Least Connections for the read tier — some timeline reads take much longer than others (deep-pagination, many celebrity follows), and least-connections naturally avoids piling more work on a slow pod.
Twitter-scale failures are subtle — a slow shard, a skewed Kafka partition, a worker pool deadlocking. Without instrumentation you find out from users tweeting "Twitter is broken", which is awkward when you are Twitter.
| Metric | What it tells you | Alert threshold |
|---|---|---|
| Tweets/sec | Write traffic — is there a spike? | ±50% from baseline |
| Timeline p50 / p99 latency | Read path health | p99 > 300ms |
| Fan-out lag | How far behind real-time is the worker queue | > 30s |
| Cache hit rate | Is the cache actually helping | < 90% |
| DB replication lag | Multi-region read freshness | > 10s |
| 5xx rate per endpoint | Service-level errors | > 0.1% |
All metrics flow through Kafka into Prometheus + Grafana for live dashboards, and into a data warehouse for product analytics. PagerDuty wakes the on-call when red-line alerts fire.
The core product is post-tweet + timeline. The features below are layered on top using the same primitives.
Elasticsearch index over every tweet's text. Fan-out ⑥ feeds new tweets in near-real-time. Sharded by tweet_id with timestamp embedded so "search worldcup, recent first" is a partition-pruned scan, not a full-cluster broadcast. See the dedicated Twitter Search page for index design and ranking details.
Sliding window count of hashtags over the last 5-60 minutes. A min-heap maintains the top-K (K=10 globally, with location-personalized variants). Refreshed every minute — small enough updates to feel real-time, large enough not to flap.
Graph algorithm (friends-of-friends — "people followed by people you follow") combined with ML signals (common location, similar interests inferred from engagement). Pre-computed nightly per user; refreshed on big follow-graph changes.
Editorially-curated tweet collections around a theme (a sports event, a breaking-news story). Stored as a list of tweet_ids in a "moments" table. ML helps surface candidate tweets; humans do final curation.
Stored as a separate object that references the original tweet_id ({ retweet_id, retweeter_user_id, original_tweet_id, timestamp }). On render, the original tweet is hydrated and shown with a "Retweeted by ..." header. Fans out to retweeter's followers exactly like a normal tweet.
Fan-out workers also extract @mentions from each tweet and push notification events to the mentioned users. A separate Notification Service holds per-user notification lists in Redis (similar shape to Timeline Cache) and drives push-notifications via APNs/FCM.