← Back to Design & Development
High-Level Design

Facebook Newsfeed

From "JOIN every friend's posts on read" to a hybrid push/pull feed with ML ranking — the architecture, the trade-offs, and why every box earns its place

Read this with the framework in mind

This deep-dive applies the 4-step HLD interview framework. As you read, map each section to Requirements → Entities → APIs → High-Level Design → Deep Dives, and notice which of the 8 common patterns and key technologies are at play.

Framework → 8 Patterns → Tech Cheat Sheet →
Step 1

What is Newsfeed?

Imagine Sarah opens Facebook on her phone over morning coffee. The first thing she sees is a feed: Raj's vacation photo from Bali at the top, then her mom's birthday post (with 47 comments already), then her favorite cooking page's latest recipe video, then a friend's status from yesterday she missed. None of these are sorted by raw time. They're ranked — by what's most relevant to her, right now. Behind that one swipe is a system that picks 50 stories out of potentially 50,000 candidate posts, scores each one with an ML model, and assembles the result in under 2 seconds, 1.5 billion times a day.

Beyond just being chronological, the Newsfeed is a personalized newspaper printed for each reader — Raj's morning paper looks nothing like Sarah's, even though they share dozens of friends. The system has to handle text posts, photos, videos, and shared links from people, pages, and groups a user follows, and surface them in the order they're most likely to engage with.

The two questions that drive every design decision below: (1) How do we assemble a personalized feed from thousands of follows in under 2 seconds, when every refresh has to look fresh? (2) How do we make a celebrity post (Cristiano Ronaldo, 600M followers) reach every fan's feed without melting the system?
Step 2

Requirements & Goals

Before drawing a single box, pin down what the system must do. The Newsfeed has more moving parts than most because each user's feed is unique, content types are heterogeneous, and the ranking is the product.

✅ Functional Requirements

  • Generate a personalized feed for each user from people, pages, and groups they follow
  • Support multiple content types — text posts, images, videos, shared links
  • New posts from a user's network appear in their feed for active users
  • Feed is ranked by relevance, not strict chronological order
  • Pagination — users can scroll back through older items

⚙️ Non-Functional Requirements

  • Real-time generation — feed loads in under 2 seconds end-to-end
  • Low publish latency — a new post becomes visible in followers' feeds within 5 seconds
  • Scale — average user has ~300 friends + ~200 pages followed; some have thousands
  • High availability — Facebook without a feed is Facebook with nothing

🚫 Out of Scope

  • Comments, likes, reactions UX (separate service)
  • Direct messaging, stories, marketplace
The hard requirement is the combination. Generating a feed offline in 30 seconds is easy. Generating one in 2 seconds at request time is hard. Doing it for 300M people simultaneously, with a 5-second freshness guarantee on new posts, is the part that requires architecture.
Step 3

Capacity Estimation & Constraints

Numbers are not optional. The Newsfeed is read-heavy and fan-out heavy — both shape every architectural choice that follows.

User and traffic estimates

Assume 300 million daily active users (DAU). Each user fetches their feed roughly 5 times per day (open the app, scroll, leave, come back). That's 1.5 billion feed fetches/day.

DAU

300M users/day

Active users who hit the feed

Feed fetches

1.5B / day

5 fetches × 300M users

QPS

~17,500 req/sec

1.5B / 86,400

Avg follows

300 + 200

Friends + pages per user

Cache sizing — pre-computed feeds

The biggest design lever in this system is pre-computing each user's feed and caching it instead of building it from scratch on every request. If we cache the top 500 items per user at ~1KB each (post id + metadata + media URLs):

500 × 1KB = 500KB per user × 300M active users ≈ 150 TB of cache. At 100GB per cache server, that's ~1,500 cache servers in the user-feed cache tier. Big number, but Facebook-scale, and the alternative (rebuilding on every request) costs 100× more in compute.

Write traffic — new posts

If 10% of DAU posts something each day, that's 30 million new posts/day = ~350 posts/sec. Each post may fan out to hundreds of followers' feed caches — so the fan-out write rate is enormous: 350 × ~500 followers avg = ~175K cache writes/sec. This is why the fan-out service is a separate, async tier.

MetricValueWhy it matters
Feed fetch QPS17.5K/sDrives read tier sizing & cache hit rate target
Per-user cache500 KB500 pre-ranked feed item ids + minimal metadata
Total cache150 TBForces sharding across ~1,500 Redis nodes
New posts/sec~350/sDrives Post Service throughput
Fan-out writes/sec~175K/sForces async fan-out tier — can't run inline
Step 4

System APIs

One endpoint carries the bulk of feed traffic — getUserFeed. Another writes posts. The read API is paginated using cursor-style since_id/max_id markers, not page numbers, because the feed grows under the user's feet.

REST API surface
// Read — feed fetch, ~17.5K QPS
GET /api/v1/feed
{
  "api_key":         "abc123...",
  "user_id":         42,
  "since_id":        "post_99812",  // only items newer than this
  "count":           50,             // page size
  "max_id":          null,           // for scrolling backward
  "exclude_replies": true
}
→ 200 OK
{
  "items": [
    { "post_id": "...", "author_id": 88, "type": "image", "score": 0.91, ... },
    ...
  ],
  "next_cursor": "post_98477"
}

// Write — new post, ~350 QPS
POST /api/v1/posts
{
  "api_key":   "abc123...",
  "user_id":   42,
  "content":   "Birthday photo from mom!",
  "media_ids": ["m_771", "m_772"],
  "audience":  "friends"
}
→ 201 Created  { "post_id": "post_99813" }
Why since_id instead of page=N? The feed is a moving target — by the time a user scrolls to "page 3", the system has injected 12 new posts at the top, shifting everything down. Page numbers would show duplicates or skip items. A cursor anchored to since_id says "give me 50 items strictly older than this one I already saw", which is stable across scrolls.
Why is score on every item? Because the feed isn't strictly chronological — the client may want to display the score (rare) or use it for client-side re-ordering after the user marks a post as "not interested". The server is the source of truth for ranking, but exposing the score keeps the contract honest.
Step 5

Database Schema

Six core entities. Notice that the schema looks broadly relational — there are relationships (a post belongs to a user, a follow links two users) — but at our scale we'll partition each table aggressively and skip cross-table joins entirely. Most lookups are by primary key.

erDiagram USER { bigint user_id PK string name string email timestamp created_at timestamp last_login } ENTITY { bigint entity_id PK string name string type timestamp created_at } USER_FOLLOW { bigint user_id PK bigint target_id PK string target_type timestamp followed_at } FEED_ITEM { string post_id PK bigint user_id FK bigint entity_id FK text content string location int num_likes timestamp created_at } FEED_MEDIA { string post_id PK bigint media_id PK } MEDIA { bigint media_id PK string type string s3_url int width int height } USER ||--o{ USER_FOLLOW : "follows" USER ||--o{ FEED_ITEM : "authors" ENTITY ||--o{ FEED_ITEM : "publishes" FEED_ITEM ||--o{ FEED_MEDIA : "has" MEDIA ||--o{ FEED_MEDIA : "referenced by"

👤 USER & ENTITY

USER holds people accounts. ENTITY holds pages and groups — anything a user can follow that isn't itself a user. We could merge these, but separating them lets us evolve their lifecycle independently (a Page has admins, monetization, and verification flags that a User does not).

🔗 USER_FOLLOW

The follow graph. (user_id, target_id, target_type) — so Sarah follows person 88, page 451, and group 99 in three rows. target_type distinguishes them. This table is read on every fan-out: "who follows post-author 42?" and "which entities does Sarah follow?". Indexed both directions.

📝 FEED_ITEM

The post itself. PK is post_id (Snowflake-style: timestamp + shard + sequence, monotonically increasing). Stored in Cassandra, sharded by post_id for uniform write distribution. The content text lives here; media is referenced by id.

🖼️ FEED_MEDIA + MEDIA

One post can have many media files (a 5-photo album), so we model it as a join: FEED_MEDIA links post_id ↔ media_id, and MEDIA stores the metadata (s3 URL, dimensions, MIME). The actual bytes live in S3 and are served via CDN — never through our app servers.

Why mostly relational but partitioned? The data is relational — posts have authors, users follow entities — but we never run "JOIN posts ON users WHERE follows" at query time. Instead, we resolve relationships at fan-out time (asynchronously, on write) and at read time we just look up pre-computed feed-item ids by user_id. The relational schema is for clarity; the storage engine is partitioned NoSQL (Cassandra for posts, Redis for cached feeds).
Step 6 · CORE

High-Level Architecture — From Naive to Production

This is the section that wins or loses the interview. We'll build the architecture in three passes: the simplest thing that could plausibly work, why it falls apart, and the production shape where every box justifies itself. The capacity numbers from §3 drive every decision.

Pass 1 — The naive design (and why it breaks)

Sketch the simplest possible system: when Sarah opens Facebook, an app server runs a single SQL query — "give me all posts authored by users Sarah follows, ordered by date, limit 100". One database holds everything.

flowchart LR C["Client"] --> APP["App Server"] APP -->|"SELECT posts WHERE author_id IN (Sarah follows) ORDER BY date DESC LIMIT 100"| DB[("Single SQL DB")] DB --> APP APP --> C

Four concrete failures emerge the moment real traffic shows up:

💥 Fan-in across thousands of follows

Sarah follows 300 people + 200 pages = 500 sources. The query becomes WHERE author_id IN (... 500 ids ...). The DB has to gather, merge-sort, and rank posts from 500 partitions for every single fetch. At 17K req/sec each fanning into 500 sources, the DB is doing 8.5M sub-lookups per second. Latency goes from 50ms to 5+ seconds — feed never loads.

💥 Every fetch hits the DB

No cache means 17,500 req/sec slamming the database directly. Each query touches hundreds of rows. A single Postgres node tops out around 5K simple SELECTs/sec; we need thousands of complex multi-source queries/sec. The DB CPU pegs and p99 explodes.

💥 Celebrity posts melt fan-out

Cristiano Ronaldo posts. He has 600M followers. If we tried to push his post to each follower's feed inline, that's 600M cache writes triggered by one HTTP POST. The post would take 30+ minutes to "publish", and during that time the fan-out service blocks every other write.

💥 Strict chronological is a worse product

Even if we could serve the SQL fast, sorting purely by created_at shows Sarah a stranger-in-her-group's promo post above her mom's birthday photo. Engagement collapses. The feed must be ranked by relevance — and ranking adds 50ms+ of ML scoring per item, making Pass 1 even less viable.

Pass 2 — The mental model: pre-compute on write, not on read

The single insight that unlocks the whole design: feeds are read 100× more often than they're written to. So move the work to write time. When a user posts, fan out to all their followers' pre-computed feed caches asynchronously. When a user opens the app, the feed is just a single cache fetch — already assembled, already ranked.

📤 Push (fan-out-on-write)

Used for normal users. When mom posts a birthday photo, the system writes the post id into all 200 followers' feed caches. Read becomes O(1). Works beautifully when fan-out per post is bounded — say under 10K followers.

📥 Pull (fan-out-on-load)

Used for celebrities. Ronaldo's posts are not pushed. Instead, on every feed read, the system fetches Ronaldo's 50 most-recent posts directly from the post DB and merges them with the reader's pre-computed cache. Write becomes O(1) regardless of follower count.

The hybrid model wins. Push for the long tail of normal users; pull for the head of celebrities; merge both at read time. This is the same pattern Twitter uses for its home timeline, but Facebook adds a second layer — ML ranking — so the merged set is then scored and re-ordered before returning the top 50.

So what's the threshold? A common rule: if a user has more than ~1 million followers, switch them to the pull side. Push fan-out for 1M followers means 1M cache writes per post — slow but feasible. Above that (Ronaldo at 600M, Taylor Swift at 280M) it's untenable. The threshold is configurable per region and adjusts with capacity.

Pass 3 — The production shape

Now the full picture. Every node is numbered — find its matching card below to see what it does and what would break without it.

flowchart TB CL["① Client — Web / iOS / Android"] subgraph EDGE["Edge Tier"] LB["② Load Balancer"] WEB["③ Web Server"] end subgraph APP["Application Tier"] AS["④ Application Server"] PS["⑤ Post Service"] FGS["⑩ Feed Generation Service"] RS["⑪ Ranking Service"] end subgraph DATA["Data Tier"] PDB[("⑥ FeedItem DB — Cassandra")] S3[("⑦ Media Storage — S3 + CDN")] UFC[("⑨ User Feed Cache — Redis")] end subgraph ASYNC["Async Plane"] FOS["⑧ Fan-out Service"] NS["⑫ Notification Service"] end CL --> LB LB --> WEB WEB --> AS AS -->|"new post"| PS AS -->|"feed fetch"| FGS PS --> PDB PS --> S3 PS -.publish event.-> FOS FOS -->|"push to followers"| UFC FOS -.celeb posts skipped.-> PDB FGS --> UFC FGS -->|"pull celeb posts"| PDB FGS --> RS FGS --> S3 PS -.notify.-> NS NS -.push.-> CL style CL fill:#e8743b,stroke:#e8743b,color:#fff style LB fill:#171d27,stroke:#9b72cf,color:#d4dae5 style WEB fill:#171d27,stroke:#3cbfbf,color:#d4dae5 style AS fill:#171d27,stroke:#4a90d9,color:#d4dae5 style PS fill:#171d27,stroke:#e8743b,color:#d4dae5 style PDB fill:#171d27,stroke:#38b265,color:#d4dae5 style S3 fill:#171d27,stroke:#3cbfbf,color:#d4dae5 style FOS fill:#171d27,stroke:#d4a838,color:#d4dae5 style UFC fill:#171d27,stroke:#e05252,color:#d4dae5 style FGS fill:#171d27,stroke:#4a90d9,color:#d4dae5 style RS fill:#171d27,stroke:#9b72cf,color:#d4dae5 style NS fill:#171d27,stroke:#d4a838,color:#d4dae5

Component-by-component — what each numbered box does

Use the numbers in the diagram above to find the matching card. Each one answers what is this, why is it here, and what would break without it.

Client (Web / iOS / Android)

The Facebook app or web page. Issues a GET /feed when the user opens the app or pulls down to refresh, and a POST /posts when they publish. Holds a small client-side cache of the last fetched feed so a quick re-open doesn't always hit the network. Maintains a websocket for real-time notifications.

Solves: nothing on its own — but every latency budget flows backward from "what does the client experience?" The 2-second feed-load goal is measured from the client's tap to first paint, not from the server's response.

Load Balancer

Public-facing layer-7 LB (e.g., AWS ALB or in-house equivalent). Distributes incoming HTTPS across web-server pods, terminates TLS, and yanks unhealthy backends via 5-second health checks. Sticky sessions are not needed because everything below is stateless.

Solves: single-server failure and traffic-spike absorption. Without it, one app crash takes down the whole region. With it, we lose 1/N of capacity for a few seconds until health checks kick the bad pod out.

Web Server

Stateless edge service that handles the HTTP/HTTPS surface — auth, request validation, rate-limit enforcement, response compression. Hands the parsed request off to the application server tier. Easy to scale horizontally because no state lives here.

Solves: isolating protocol concerns (TLS, HTTP parsing, auth) from business logic. Without this split, every app-server pod would have to ship a TLS termination stack and rate-limit Redis client — wasteful and harder to upgrade.

Application Server

The orchestrator. For a feed fetch, it calls Feed Generation, applies user-level filters (muted authors, language preference), and returns the response. For a new post, it forwards to Post Service. Holds zero per-user state — every request is independent.

Solves: a clean seam between "what the client asked for" and "which downstream services do the work". Without it, web servers would have to know about Cassandra, Redis, S3, and the ranking model — coupling that breaks every refactor.

Post Service

Handles the POST /posts write path. Validates content, generates a Snowflake-style post_id, writes the row to FeedItem DB, uploads any media to S3, and emits a "post created" event onto the fan-out queue. Latency budget: under 200ms — the user is staring at a spinner.

Solves: isolating the write path so it scales independently of reads. Without a dedicated post tier, write spikes (e.g., a major news event) would compete with feed-read traffic on the same pods, and reads would slow during the spike.

FeedItem DB (Cassandra)

The source of truth for posts — a partitioned, replicated NoSQL store sharded by post_id. Cassandra was chosen because the access pattern is "look up post by id" + "list recent posts by author" — both indexable, neither needing joins. Replicated 3× across availability zones with quorum reads/writes for durability without sacrificing availability.

Solves: durable post storage at petabyte scale. A single MySQL box maxes out around 1TB; we have decades of posts to keep. Cassandra spreads it across hundreds of nodes and survives single-node failure transparently.

Media Storage (S3 + CDN)

Photos and videos go to S3. URLs are then cached at edge POPs by a CDN (CloudFront / Akamai). When Sarah's feed includes mom's photo, the feed response carries the CDN URL — Sarah's browser fetches the bytes directly from her nearest edge node, never through our app servers.

Solves: the bandwidth problem. A single 4MB photo viewed by 10K followers is 40GB of egress. Pushing that through our app-server tier would saturate it instantly. The CDN absorbs it at the edge — that's the entire reason CDNs exist.

Fan-out Service

The async heart of the push side. Consumes "post created" events from a Kafka topic. For each event: look up the author's followers, filter out anyone above the celebrity threshold, and write the new post_id into each remaining follower's User Feed Cache. Capped at e.g. 500 entries per user, oldest evicted. Runs on hundreds of worker pods — embarrassingly parallel across posts.

Solves: the "make new posts visible to followers fast" requirement without blocking the publish HTTP request. Without async fan-out, mom's post couldn't return 201 until 200 cache writes completed — slow on the happy path and catastrophic if any cache shard is briefly unhealthy.

User Feed Cache (Redis)

The crown jewel. Per-user pre-computed list of feed-item ids — a Redis sorted set keyed by user:{id}:feed, scored by post timestamp, capped at 500 entries. Sharded by user_id across ~1,500 nodes. A feed fetch is one ZREVRANGE on the right shard — sub-millisecond.

Solves: the 17.5K-req/sec read load. Without this cache, every fetch would hit Cassandra with a fan-in query across hundreds of authors — orders of magnitude slower and impossible to scale linearly. The cache makes the read path effectively free.

Feed Generation Service

The read-path orchestrator. For each feed fetch: (1) ZREVRANGE the user's pre-computed cache for the top 500 ids; (2) for each celebrity the user follows, SELECT the celebrity's last 50 posts from FeedItem DB; (3) merge both sets; (4) hand to Ranking Service; (5) hydrate the top-N with full post bodies and media URLs. Total budget: under 500ms.

Solves: the merge-and-rank step that the cache alone can't do. The cache holds candidates; ranking picks the winners. This service also handles cache misses (regenerate on the fly for inactive users) and pagination.

Ranking Service

The ML brain. Given a candidate list of post ids and a viewer id, returns each post's relevance score — a function of recency, post engagement (likes, comments, shares), viewer-author affinity (how often Sarah interacts with mom), content type (Sarah loves videos), and predicted dwell time. Implemented as a model server (e.g., TorchServe) with a feature store. Scoring 500 candidates takes ~30-50ms.

Solves: the relevance problem. Without ranking, the feed is reverse-chronological — and reverse-chrono is empirically worse for engagement than ranked. Personalization is the product.

Notification Service

Pushes "your friend just posted" alerts to active users via websocket / APNs / FCM. Subscribes to the same fan-out events as the cache writers, but emits user-visible notifications instead of cache mutations. Throttled per-user so a friend who posts 10 times in an hour doesn't generate 10 buzz notifications.

Solves: the "new post visible within 5 seconds" goal for users who already have the app open. Without it, fresh posts only appear when the user manually refreshes — defeating the freshness requirement.

Concrete walkthrough — mom posts, Sarah scrolls

Two real flows mapped to the numbered components above.

📤 Write flow — mom posts a birthday photo at 14:02:06

  1. Mom's phone ① POSTs /api/v1/posts with text + photo bytes.
  2. Load Balancer ② → Web Server ③ → Application Server ④ → Post Service ⑤.
  3. Post Service uploads the photo to S3 ⑦ (a few hundred ms), gets back a media id.
  4. Post Service generates post_id = post_99813, writes {post_id, user_id, content, media_id, created_at} into FeedItem DB ⑥.
  5. Post Service emits a "post created" event onto Kafka and returns 201 {post_id} to mom. Total: ~250ms — mom sees her post on screen.
  6. The Fan-out Service ⑧ picks up the event asynchronously, looks up mom's 200 followers (Sarah is one), and pushes post_99813 into each one's User Feed Cache ⑨ via ZADD user:{id}:feed. Total fan-out: under 5 seconds.
  7. The Notification Service ⑫ pushes a websocket message to active followers — Sarah's app shows a "1 new post" banner.

📥 Read flow — Sarah opens Facebook at 14:03:30

  1. Sarah's app ① GETs /api/v1/feed with her user_id and a since_id cursor.
  2. LB ② → Web Server ③ → Application Server ④ → Feed Generation Service ⑩.
  3. Feed Gen issues ZREVRANGE user:42:feed 0 499 against User Feed Cache ⑨ → 500 candidate post ids back, including mom's brand-new post_99813.
  4. Feed Gen identifies the 3 celebrities Sarah follows, queries FeedItem DB ⑥ for each celebrity's 50 most-recent posts → merges them with the cached candidates.
  5. Feed Gen passes the merged ~650 candidates + Sarah's user features to Ranking Service ⑪ → gets back 50 winners, scored.
  6. Feed Gen hydrates the 50 winners — fetches post bodies from FeedItem DB ⑥ and replaces media ids with CDN ⑦ URLs.
  7. Response returned to Sarah's app. Mom's birthday photo is at position #2. Total: ~800ms.
So what: the architecture is built around three insights — (1) writes are 100× rarer than reads, so we move feed assembly to write-time via fan-out; (2) celebrities break naive fan-out, so we use a hybrid push/pull model with a follower-count threshold; (3) chronological is a worse product than ranked, so a dedicated ML scoring tier sits between the candidate cache and the response. Every box in the diagram earns its place by removing one of those failure modes from Pass 1.
Step 7

Feed Generation — Pre-compute vs. On-Demand

The core trade-off: do we keep a pre-computed feed for every user 24/7, or only for active ones? Storage vs. compute.

Active users — pre-compute and keep warm

For users who open Facebook at least once a week, we maintain a continuously-updated feed cache. Every time the fan-out service receives a new post from someone they follow, the user's cache is updated immediately. By the time Sarah opens the app, her feed is already assembled — the read path is just a cache fetch + ranking.

📦 Cache size per active user

Top 500 candidates (post ids + lightweight metadata). 500 × ~1KB = ~500KB per user. Why 500? Because even a heavy scroller rarely goes past 200 items in one session, and 500 gives us headroom for ranking to discard half of them.

♻️ Cache refresh on new post

The fan-out service is the only writer. It does ZADD user:{id}:feed score=timestamp post_id and ZREMRANGEBYRANK user:{id}:feed 0 -501 to keep the cap at 500. Old entries are evicted automatically as new ones arrive.

Inactive users — LRU eviction, regenerate on return

What about a user who hasn't opened Facebook in 6 months? Keeping their feed cache live is wasted memory. We let inactive users fall out of cache via an LRU policy on the cache cluster — if their entry hasn't been read in 30 days, it's evicted. When they finally return, the Feed Generation Service detects the cache miss and rebuilds the feed on demand: query FeedItem DB for the last 200 posts from each followed entity, merge, rank, return. Slower (~3-5 seconds) but only happens once per dormant user per session.

The trade-off in plain English: we pay 150TB of cache memory to give 300M active users sub-second feed loads. We accept a 3-5 second cold-start for the long tail of dormant users. If we tried to cache everyone (3 billion total accounts), the memory bill would be 10× higher for users who'd never read it.

Pre-warming on app open

An optimization: when a dormant user logs in, the auth flow fires an async "warm cache" job in parallel with serving the login response. By the time the user navigates to the feed (a second or two later), the cache may already be populated, hiding the cold-start latency.

Step 8

Pull vs. Push vs. Hybrid — The Defining Trade-off

This is the question every Newsfeed/timeline interview circles back to. The honest answer is "hybrid" — but you only earn that answer by walking through the two extremes first.

sequenceDiagram participant Mom as "Mom (200 followers)" participant Ronaldo as "Ronaldo (600M followers)" participant FOS as "Fan-out Service" participant UFC as "User Feed Cache" participant FGS as "Feed Generation" participant PDB as "FeedItem DB" participant Sarah as "Sarah (reader)" Note over Mom,FOS: Push path — normal user Mom->>FOS: post created FOS->>UFC: ZADD into 200 follower caches Note over UFC: Sarah's cache now has mom's post Note over Ronaldo,FOS: Pull path — celebrity Ronaldo->>FOS: post created FOS-->>FOS: skip fan-out (above threshold) FOS->>PDB: just persist the post Note over Sarah,FGS: Read merges both Sarah->>FGS: GET /feed FGS->>UFC: ZREVRANGE Sarah's cache (mom's post) FGS->>PDB: SELECT latest from each celeb Sarah follows (Ronaldo) FGS->>FGS: merge + rank FGS-->>Sarah: top 50 ranked items

📥 Pure Pull (fan-out-on-load)

How: on every feed read, query each followed entity's recent posts and merge.

Pros: trivially handles celebrities; no wasted work for dormant users; fresh by definition.

Cons: read latency scales with follower count — Sarah's 500 follows = 500 sub-queries per fetch. At 17.5K req/sec, that's 8.75M sub-queries/sec slamming the DB.

📤 Pure Push (fan-out-on-write)

How: on every post, write the post id into all followers' pre-computed feed caches.

Pros: read is O(1) — just a cache fetch.

Cons: celebrities are catastrophic — Ronaldo's post triggers 600M cache writes. Also wasteful: dormant users' caches are constantly updated for posts they'll never see.

🔀 Hybrid (production winner)

How: push for users with under 1M followers, pull for celebrities, merge at read time.

Pros: bounded fan-out cost per write, bounded read cost (cache + small celeb pull).

Cons: two code paths, threshold tuning, complexity in the merge step.

When to use each — concrete thresholds

Author follower countStrategyWhy
< 10KPush to allFan-out is cheap; bounded cost
10K – 1MPush to active followers onlySkip dormant followers via the same LRU rule from §7 — saves memory without hurting any active user
> 1M (celebrity)Pull on readPush would be 1M+ cache writes per post — too expensive
The interview move: volunteer pure pull first ("simplest thing that could work"), explain why it can't hit the 2-second SLA. Volunteer pure push next, explain the celebrity catastrophe. Land on hybrid as the answer that fixes both. Bonus points for noting that which side a user is on can change over time as their follower count grows.
Step 9

Feed Ranking — Why Two Friends See Different Orders

The feed isn't sorted by created_at DESC. It's sorted by a relevance score computed per (viewer, post) pair — meaning Sarah and Raj could both follow mom and both see her birthday post, but at different positions in their respective feeds.

The score function — what goes into it

📅 Recency

Decay factor on now - created_at. A post from 30 minutes ago beats one from yesterday, all else equal. Half-life around a few hours for friends, longer for groups.

📈 Post engagement

Total likes, comments, shares. Higher engagement = the network has signaled this is interesting. Normalized so a celebrity's 10K likes don't always beat a friend's 50.

🤝 Viewer-author affinity

How often does Sarah interact with this author? Mom (Sarah's mom, lots of likes) scores way higher than a high-school acquaintance she never engages with. Computed offline daily and fetched from a feature store.

🎬 Content type preference

Sarah watches videos to completion; Raj scrolls past them. The model learns each viewer's content-type lean and boosts matching content.

⏱️ Predicted dwell time

The model predicts how long Sarah will stop on this post. Long predicted dwell = boost. Used to penalize clickbait that gets clicks but no read-time.

🚫 Negative signals

Does Sarah hide posts from this author? Has she marked similar content "not interested"? Has the author been recently rate-limited for low-quality content? All subtract from the score.

How it runs at request time

The Feed Generation Service hands the Ranking Service a batch — viewer features + 500 candidate post features. The model is a gradient-boosted tree or a small neural network served by a model server (TorchServe, TensorFlow Serving). Scoring 500 candidates in a single batch takes ~30-50ms on a modern GPU/CPU. The top 50 by score come back; the rest are discarded.

Why personalized ranking matters specifically here: reverse-chronological worked in 2008 when users had 50 friends. With 500 follows, raw chrono drowns the meaningful posts in noise. Ranking is what makes the feed feel like a curated paper instead of a firehose. Trade-off: predictability — users sometimes complain "why didn't I see X's post?" — but the engagement gains far outweigh the friction.
Step 10

Data Partitioning

Two storage tiers, two different partitioning keys. Choosing the wrong one for either tier breaks scaling.

📝 FeedItem DB — shard by post_id

Posts are written ~350/sec from millions of authors. Sharding by post_id (Snowflake-style id, includes timestamp + shard hint) gives uniform write distribution — no hot author can melt one shard. Reads are by post_id, which is also fast.

What about "list posts by author"? Use a secondary index on (author_id, created_at) — Cassandra can index it natively. Slightly slower than primary-key reads but only used for the celebrity pull path, which is the long-tail traffic.

📦 User Feed Cache — shard by user_id

Each user's pre-computed feed lives entirely on one shard, keyed by user:{user_id}:feed. Why not shard by post? Because the read pattern is "give me Sarah's feed" — single-user, single-key. Spreading Sarah's feed across multiple shards would force a scatter-gather on every read.

Consistent hashing on user_id across the ~1,500-node Redis cluster. Adding/removing nodes only relocates 1/N of users instead of all of them.

Replication for fault tolerance

Each Cassandra shard is replicated 3× across availability zones with quorum writes (W=2, N=3) and quorum reads (R=2). Each Redis cache node has a hot replica — if the primary dies, the replica takes over in seconds; we lose at most a few seconds of fan-out writes that hadn't replicated yet, which is acceptable for a feed cache.

Why two different shard keys? Because writes care about author distribution (you want all authors writing to all shards uniformly), but reads care about user locality (you want all of one user's data on one shard). The two access patterns don't compose — so we use two stores, each sharded for its own pattern, and pay the cost of synchronizing them via the fan-out service.
Step 11

Cache Strategy — Multi-Tier

The cache isn't one box; it's a layered hierarchy. Each tier catches a different class of repeat work.

🟢 Tier 1 — In-process LRU

On each Feed Generation pod, an in-memory LRU keyed by user_id with a 30-second TTL. If Sarah refreshes twice in a row, the second refresh hits the in-process cache and skips Redis entirely. Tiny (a few hundred MB per pod) but kills millisecond-level repeat-fetch noise.

🟡 Tier 2 — User Feed Cache (Redis)

The big one. The 150TB sharded Redis cluster from §6 holding pre-computed feeds. Hit rate target: >95%. LRU eviction at the cluster level for inactive users.

🔴 Tier 3 — FeedItem DB (Cassandra)

Last resort — only hit on cache miss (cold user) or for the celebrity pull path. Not technically a cache, but the source of truth that the upper tiers fall back on. Sized to handle the long-tail miss rate, not the full 17.5K req/sec.

Pre-warm on app open

When a user logs in, the auth service emits an async "user is active" event. The Feed Generation Service consumes it and pre-warms the in-process LRU for that user before they navigate to the feed. Hides the cold-start latency.

The cache invalidation rule: we never invalidate. Posts are immutable (you can edit, but edits are rare and we tolerate brief staleness). New posts are added to the cache by fan-out, not replaced. Deletes are handled lazily — if Feed Generation hydrates a deleted post and gets a 404 from FeedItem DB, it filters that post out and removes its id from the user's cache entry.
Step 12

Interview Q&A

Push vs. pull for feed delivery — which would you pick?
Hybrid. Pure push catastrophes on celebrities (Ronaldo's 600M-follower fan-out). Pure pull catastrophes on read latency (500 sub-queries per fetch). Hybrid uses push for under-1M-follower authors and pull for celebrities, merging at read time. The threshold is tunable per region and adjusts as compute capacity changes. Same approach Twitter uses for the home timeline.
How do you handle a celebrity (Cristiano Ronaldo) posting?
Skip fan-out entirely. When Ronaldo posts, the post is written to FeedItem DB but the fan-out service skips writing to followers' caches (would be 600M writes, untenable). Instead, the Feed Generation Service knows Ronaldo is a "pull" author — every time a follower fetches their feed, FGS does a lightweight SELECT for Ronaldo's last 50 posts and merges them with the follower's cached feed. Post is delivered in the next read; latency budget unaffected because the celebrity-pull queries are tiny.
How does the ranking actually work?
ML model serving. Each candidate post's score is a function of recency, post engagement (likes/comments/shares), viewer-author affinity (how often the viewer interacts with the author), content-type preference (does the viewer prefer videos?), and predicted dwell time. Implemented as a gradient-boosted tree or small neural net served by TorchServe / TF Serving. Features come from an offline-computed feature store (affinity scores updated daily) plus realtime signals (post age, current like count). Scoring 500 candidates in a single batch takes ~30-50ms.
How do you ensure new posts appear in followers' feeds within 5 seconds?
Async fan-out with bounded queue depth. When a normal user posts, the Post Service emits a Kafka event and returns 201 immediately. A pool of fan-out workers consumes the event and writes the post id into each follower's Redis cache via ZADD — typically all 200-500 follower caches updated in 1-3 seconds. We monitor end-to-end p99 fan-out latency; if it creeps past 5 seconds we add workers. For active followers with the app open, the Notification Service also pushes a websocket message so they see "1 new post" instantly.
What's the trade-off of pre-computing feeds for inactive users?
Storage vs. cold-start latency. If we kept every account's feed warm, we'd 10× the cache bill for users who never read it. Instead we use LRU eviction — if a user hasn't fetched their feed in 30 days, their cache entry is dropped. When they return, Feed Generation rebuilds the feed on demand by querying FeedItem DB for each followed entity (a 3-5 second operation). The cold-start happens once per dormant session and is masked by an async "warm cache" job kicked off at login time. Net: 150TB cache instead of ~1.5PB, with <1% of users experiencing the slow first fetch.
How do you handle a feed cache miss?
Rebuild inline, then populate the cache. Feed Generation does ZREVRANGE user:42:feed against Redis. On miss (empty result), it falls back to: (1) query FeedItem DB for the last 200 posts from each entity user 42 follows; (2) merge, sort by timestamp, take top 500; (3) write the result back to Redis as the user's new feed cache; (4) hand to ranking and return. Latency: 3-5 seconds vs. 800ms warm. Acceptable because miss rate is <5% (only dormant users).
Why Cassandra for FeedItem DB and Redis for the feed cache — why not one store?
Different access patterns, different trade-offs. FeedItem DB needs durable, petabyte-scale storage with multi-region replication and 3× replication for fault tolerance. Cassandra excels at this. The feed cache needs microsecond-latency, sorted-set semantics, in-memory speed — that's Redis territory. Forcing one store to do both means either Cassandra is too slow for the read path or Redis is too expensive (RAM-priced) for petabyte storage. Two stores, each optimized for its workload, beats one compromise store.
How would you handle a "trending" event — say, a viral post during the Super Bowl?
The cache + CDN absorb it. A viral post is pulled from FeedItem DB once per cache miss and then served from Redis to every subsequent reader. Media (the photo or video) is in S3 and CDN-cached at edge POPs — even if 50M people view it in an hour, the origin sees thousands of fetches, not millions. The Ranking Service may also boost the post's score given its engagement spike. The one risk is the fan-out write storm for the original post — if the viral author has 5M followers, fan-out takes longer; we either accept a 30-second freshness lag or pre-classify them above the celebrity threshold and switch to pull mode.
The one-line summary the interviewer remembers: "It's a hybrid push/pull model — fan-out-on-write to followers' Redis-cached feeds for normal users, fan-out-on-load for celebrities, merged and ML-ranked at read time. The cache makes reads O(1); the celebrity threshold makes writes bounded; the ranker makes the feed feel personal."