← Back to Design & Development
HLD Interview Methodology

The HLD Interview Framework

A 45-minute timeboxed playbook for attacking any system design question — from "design Twitter" to "design Bitly" — without panicking, wandering, or over-engineering.

Step 0

Why You Need a Framework

Picture this. Sarah sits down for her L5 system design interview. The interviewer takes a sip of coffee and says, "Design Twitter." Sarah nods, opens the whiteboard tool, draws a box labeled "Web Server", then a box labeled "Database", connects them with a line — and freezes. Twenty minutes later she has eight disconnected boxes, no requirements, no API, no numbers, and the interviewer is asking, "So… what about the feed?" Sarah panics, mumbles "Redis", and the loop is over before she's said anything substantive.

The problem isn't that Sarah doesn't know the components. She knows what Redis does, she knows what Kafka does, she's read about consistent hashing twice. The problem is she has no playbook for the 45 minutes — no order in which to think, no checklist for what must be on the board before time runs out. HLD interviews aren't a memory test; they're a structured-thinking test, and structure is what a framework gives you.

The framework is a recipe, not a cage. A recipe tells you: chop the onions first, then the garlic, then deglaze with wine. You don't have to make the same dish every time, but the order matters — if you add the onions after the wine, you get a soggy mess. The HLD framework works the same way: requirements before APIs, APIs before architecture, architecture before deep dives. Skip a step and the rest collapses.

This page lays out the playbook used by senior engineers at FAANG, distilled from the methodology hellointerview.com teaches, and shows you how to apply it minute-by-minute. By the end you should be able to walk into any HLD loop, hear the prompt, and know exactly what you're going to say in the next 45 minutes.

Scoring Rubric

The Four Evaluation Pillars — What Interviewers Score

Before learning how to deliver, understand what's actually being measured. Interviewers at most senior engineering ladders are scoring four dimensions on every HLD loop. Knowing them lets you spend your minutes on what matters.

① Problem Navigation

Can you take an ambiguous prompt — "design Twitter" — and break it into a prioritized set of concrete problems? Do you know which 3 features are the spine of the system and which 50 are noise? This is where you earn points for asking "who's the user, what's the top action, what scale are we at?" instead of jumping straight to boxes.

What it looks like in practice: the candidate restates the problem in their own words, lists 5 candidate features, and explicitly says "for this interview I'll focus on these 3 — does that match what you want to see?"

② Solution Design

Can you map core distributed-systems concepts (sharding, caching, replication, queues) onto the specific components your system needs? "We use Kafka because we have asynchronous fan-out from one writer to many consumers" is solution design. "We use Kafka because Kafka is good" is not.

What it looks like: every box on the diagram is justified by a requirement from Step 1. The candidate can answer "what would break without this?" for every component.

③ Technical Excellence

Do you know the current tooling and patterns? "I'd use a sharded Postgres with logical replication" is current. "I'd use Hadoop MapReduce for the feed" was current in 2010 — today it signals you haven't kept up. Specifically: knowing when DynamoDB beats Cassandra, when Kafka beats Kinesis, when CRDTs beat last-writer-wins, when gRPC beats REST.

What it looks like: the candidate names specific products, specific data structures inside those products, and explains the trade-off versus the obvious alternative.

④ Communication

Can you explain your reasoning out loud, respond to interviewer pushback without getting defensive, and revise the design when challenged with a new constraint? The interviewer is also scoring "would I want to be on a design review with this person?"

What it looks like: the candidate narrates as they draw, pauses to ask "is this the depth you want, or should I move on?", and when the interviewer says "what about hot keys?" they incorporate the concern instead of defending the original design.

The pillar weighting is roughly 25/35/15/25. Solution Design is the heaviest because it's the hardest to fake, but Communication is the silent killer — most candidates lose loops not because their design was wrong but because the interviewer couldn't follow their reasoning. The framework that follows is engineered to make all four pillars visible at once.
The Playbook

The 4-Step Framework — A 45-Minute Timeline

Here's the entire playbook on one page. Five distinct phases, ordered by what a senior engineer would actually do at a real design review. Each phase has a hard timebox — the goal is not to finish early but to spend exactly the right amount of time on each step so you reach Deep Dives with the interviewer still engaged.

gantt title HLD Interview — 45 to 60 Minute Timeline dateFormat X axisFormat %s section Setup Step 1 — Requirements :crit, s1, 0, 5m Step 2 — Core Entities :s2, after s1, 2m Step 3 — API Design :s3, after s2, 5m section Architecture Step 4 — High-Level Design :crit, s4, after s3, 13m section Depth Step 5 — Deep Dives :crit, s5, after s4, 15m section Buffer Q and A and Wrap-up :s6, after s5, 5m

Step 1 · ~5 min

Requirements. Functional + Non-functional. Top 3 features only. Quantify NFRs.

Step 2 · ~2 min

Core Entities. Bulleted list of data nouns. User, Tweet, Follow. Fast.

Step 3 · ~5 min

API Surface. 3-5 endpoints. REST defaults. Auth-derived user_id.

Step 4 · ~13 min

High-Level Design. Walk one API at a time. Add components as needed. Build sequentially.

Step 5 · ~15 min

Deep Dives. Address NFRs and bottlenecks. Hot keys, sharding, caching, queues. Let the interviewer probe.

Buffer · ~5 min

Q&A. Interviewer's choice. Reverse questions. Recap.

The framework as a jazz solo. Steps 1-3 are the chord changes — they're fixed, you play them the same way every time. Step 4 is the head melody — recognizable from the requirements but with room for taste. Step 5 is the improvisation — it changes every interview based on what the interviewer probes, but it's only good if the chords underneath are solid. Skip the chord changes and you're just noodling.
Step 1 · 5 min

Step 1 — Requirements

The single most undervalued step. Candidates who skip it end up designing a system the interviewer didn't ask for, and the interviewer's gentle "interesting, but what about X?" lands like a wrecking ball 25 minutes in. Five minutes spent here saves twenty minutes later.

Functional requirements — "Users should be able to..."

List the top 3 user-facing capabilities. Resist the urge to brainstorm a list of 20. The interviewer doesn't have time to design 20 features in 45 minutes — they want to see depth on the few that matter. Phrase each as a user action, not a system capability.

✅ Good — Twitter top 3

  • Users should be able to post a tweet with text and optional media
  • Users should be able to follow other users
  • Users should be able to view a feed of tweets from people they follow, ordered chronologically

Three actions, each one drives a clear architectural slice. Post → write path. Follow → graph storage. Feed → fan-out problem.

❌ Bad — kitchen sink

  • Post tweets, retweet, quote-tweet, reply, like, bookmark
  • Follow users, unfollow, mute, block, report
  • View home feed, mentions, DMs, lists, trends, search
  • Notifications, analytics, ads, verified badges, polls...

You'll spend 30 minutes just listing endpoints and have no time left for architecture. The interviewer will start cutting features by force.

Non-functional requirements — "The system should be..."

NFRs are where most candidates fumble. The temptation is to list adjectives — "scalable, available, performant" — none of which mean anything specific. The fix: quantify everything, and tie each to a CAP-style trade-off.

✅ Good NFRs

  • Latency: feed loads under 500ms p99
  • Availability: 99.99% on read path (53 min downtime/year). Reads can go stale but cannot fail.
  • Scale: 200M DAU, 100M tweets/day, 100:1 read:write
  • Durability: zero tweet loss after a 200 OK is returned
  • Consistency: eventual on the feed (a 30s delay between post and follower visibility is acceptable)

❌ Bad NFRs

  • "Highly available" — what's the SLO? Can it ever go down for maintenance?
  • "Fast" — fast for whom? p50 or p99? Page load or API call?
  • "Scalable" — to what? 10K users or 10B users?
  • "Secure" — secure against what threat model?
  • "Reliable" — same as available? Or about durability?

The standard NFR checklist to walk through out loud: CAP Latency Throughput Availability Durability Consistency Security Compliance Cost. You don't need all of them on every system, but mentioning each shows you've considered it.

Capacity estimation — only if it changes the design

Back-of-envelope math is a tool, not a checkbox. Do it only when the number influences an architectural choice: does the data fit on one box, or do we need sharding? Is QPS within one DB's limits, or do we need read replicas + cache? Is bandwidth at CDN-scale, or can a single LB handle it?

A useful rule: if you can't immediately follow the math with "…therefore we need X", skip it. "500M URLs / month at 500 bytes each = 250GB/month → 15TB over 5 years → must shard" is great. "500M URLs / month at 500 bytes each = 250GB/month, anyway moving on" is wasted air.

Anti-patterns to avoid in Step 1

🚫 Listing 15 features

Forces the interviewer to cut for you. Pick the 3 hardest and hardest-coupled. The interviewer can always ask for more.

🚫 Skipping NFRs

Without quantified NFRs you can't justify any architecture choice — every "we need a cache" answer is hand-waved.

🚫 Math for math's sake

Estimating bytes when you're not going to use the number. Burns 3 minutes that could've gone to deep dives.

Step 2 · 2 min

Step 2 — Core Entities

The shortest step in the framework, and the easiest to get right. List the data nouns the system manipulates. That's it. No fields yet, no relationships, no schema — just names. This step exists to establish vocabulary with the interviewer so when you say "Tweet" later, you both mean the same thing.

Twitter

  • User
  • Tweet
  • Follow (relationship)

Bitly

  • User
  • ShortLink
  • ClickEvent

Uber

  • Rider
  • Driver
  • Ride
  • LocationUpdate

Why bother? Three reasons:

🎯 Anchors for later

When you draw the database in Step 4, you'll point at "the Tweet table" — the interviewer already knows what that means.

📐 Forces data model

If you can't list the entities, you don't understand the problem. Two minutes here exposes confusion early.

⚡ Disambiguates terms

"Post" might mean a tweet, a blog post, or an HTTP verb. Naming the entity locks in the meaning.

Don't over-engineer. No fields, no PKs, no FKs in this step — those come in Step 4 when you actually need them. Two minutes, three to five entities, move on. If you're spending 5 minutes here, you're hiding from the harder steps.
Step 3 · 5 min

Step 3 — API / System Interface

Now you commit to the contract between the client and your system. APIs make the system concrete in a way that abstract architecture diagrams never can — they force you to answer "who calls what, with what payload, and what comes back?" Every box you draw in Step 4 will be in service of one of these endpoints.

REST is the default — but use the right verb

Unless the system is real-time streaming (live location, chat, video), default to REST. Plural resources, standard HTTP verbs, JSON payloads. Three to five endpoints — one per top functional requirement plus a couple of supporting reads.

Twitter — top 3 endpoints driving top 3 functional requirements
// Post a tweet — write path
POST /v1/tweets
Headers: { Authorization: "Bearer <jwt>" }
Body:    { "text": "hello world", "media_ids": ["m_42"] }
→ 201 { "tweet_id": "t_8910", "created_at": "2026-05-07T14:02:06Z" }

// Follow another user
POST /v1/follows
Headers: { Authorization: "Bearer <jwt>" }
Body:    { "followee_id": "u_777" }
→ 204 No Content

// Get the home feed — read path, the heavy lift
GET /v1/feed?cursor=<opaque>&limit=20
Headers: { Authorization: "Bearer <jwt>" }
→ 200 {
    "tweets":      [ { "tweet_id": "t_…", "author": "…", "text": "…", "created_at": "…" }, … ],
    "next_cursor": "<opaque>"
  }

The single most important rule — never trust the client for identity

The user_id MUST come from the auth token, never from the request body. A candidate who writes POST /v1/tweets { "user_id": "u_42", "text": "..." } just designed a vulnerability where Mallory can post tweets as Sarah. The user_id is implicit — derived server-side from the JWT/session — and it's worth saying this out loud as you write the endpoints. Interviewers love this catch.

When to swap REST for something else

🌊 Streaming → WebSocket / SSE

Live chat, live location (Uber), real-time notifications. Polling at 1Hz on REST would crush the server.

📦 High-throughput → gRPC

Internal service-to-service calls where the JSON overhead and HTTP/1.1 RTT matter. Less common for the public API surface.

🔄 Async work → message queue

Not really an "API" — but worth saying "the email-send is dropped on a Kafka topic, not exposed as an endpoint."

For data-flow systems — sketch the flow, not the API

If you're designing an analytics pipeline, search indexer, or event-stream processor, the "API" is really a data flow. Draw producer → topic → consumer instead of REST endpoints. Same effect: locks down the contract.

Anti-pattern: writing 20 endpoints that cover every CRUD operation on every entity. Stick to the 3-5 that drive your functional requirements. If the interviewer wants pagination details on GET /users/:id/followers, they'll ask — don't volunteer until they do.
Step 4 · 13 min

Step 4 — High-Level Design

This is the visible centerpiece of the interview — the diagram the interviewer will photograph at the end. But the diagram itself isn't the deliverable: the narration as you build it is. Done well, this section walks through one API endpoint at a time, adding only the components that endpoint needs, until every box on the board has earned its place.

The strategy — build incrementally, one API at a time

Don't pull a full distributed-systems architecture from memory and start drawing it. The interviewer can't follow your reasoning if you skip the building-up. Instead, take API #1 (the highest-volume or most-defining one), draw the simplest possible system that satisfies it, then move to API #2 and add only what's missing.

flowchart LR subgraph P1["Pass 1 — POST /tweets only"] C1["Client"] --> A1["App Server"] --> D1[("Tweets DB")] end subgraph P2["Pass 2 — adds GET /feed"] C2["Client"] --> A2["App Server"] A2 --> D2[("Tweets DB")] A2 --> F2[("Follows DB")] end subgraph P3["Pass 3 — adds caching and fan-out"] C3["Client"] --> LB3["LB"] LB3 --> A3["App Server"] A3 --> CACHE3["Feed Cache — Redis"] A3 --> D3[("Tweets DB")] A3 --> F3[("Follows DB")] A3 -.async.-> FAN3["Fan-out Worker"] FAN3 --> CACHE3 end style C1 fill:#e8743b,stroke:#e8743b,color:#fff style C2 fill:#e8743b,stroke:#e8743b,color:#fff style C3 fill:#e8743b,stroke:#e8743b,color:#fff style CACHE3 fill:#171d27,stroke:#3cbfbf,color:#d4dae5 style FAN3 fill:#171d27,stroke:#d4a838,color:#d4dae5

Notice how Pass 1 is almost embarrassingly simple — and that's the point. The interviewer sees you starting with the smallest thing that could possibly work, then layering complexity only where a specific requirement demands it. By Pass 3 you've explained why the cache exists ("feed reads dominate at 100:1") and why the fan-out worker exists ("we precompute feeds so reads don't pay the join cost").

Document state changes explicitly

When data flows through your system, say what gets written where. "On POST /tweets, we (1) insert into the Tweets table, (2) emit a tweet_created event to Kafka, (3) the fan-out worker picks it up and pushes the tweet_id into each follower's feed cache." That sentence is worth ten boxes — it shows the interviewer you understand the runtime, not just the static topology.

Schema fields — only when relevant

You don't need to enumerate every column in every table. Mention fields when they drive a design choice: "the Tweet table is keyed by tweet_id and indexed on (user_id, created_at) so we can efficiently fetch a user's recent tweets". The fact that there's a language column doesn't help the interviewer — skip it.

✅ Good HLD narration

"For POST /tweets, the request hits the load balancer, gets routed to a stateless write app server, which inserts into the Tweets DB sharded by user_id. Then it emits a fan-out event to Kafka so feeds get rebuilt async — that way the user gets their 201 in under 100ms even if their follower list is 10M people."

❌ Bad HLD narration

"OK so we have a load balancer, and behind that we have app servers, and the app servers talk to a database, and we'll need a cache, and there's also a queue, and a search service over here, and..." [draws 12 disconnected boxes with no explanation of why each exists]

The mantra: "Build the simplest design that meets functional requirements, then layer complexity to meet non-functional requirements." Functional first — does it work for one user? Then non-functional — does it work for a million users? Mixing them is how candidates end up with Cassandra in Pass 1 of a system that doesn't even need persistence yet.
Step 5 · 15 min

Step 5 — Deep Dives

This is where senior candidates separate from mid-level candidates. By minute 25 you have a working high-level design — congratulations, that's the table-stakes deliverable. The remaining 15 minutes are about iterating the design to satisfy the non-functional requirements: latency, scale, fault tolerance, hot keys, edge cases. The interviewer is also probing for the depth of your knowledge — they have specific things they want to test, so leave room for them to drive.

The mindset shift — bottlenecks, not features

In Step 4 you added components to support new endpoints. In Step 5 you add components (or modify existing ones) to fix weaknesses in the existing design. Walk through your NFRs from Step 1 and ask: "does the current design hit this? if not, what's the bottleneck?"

🔥 Bottleneck-style deep dives

  • DB write hot spot → shard differently, or batch writes
  • DB read hot spot → add cache, read replicas
  • Single-region latency → multi-region replicas, CDN
  • Synchronous slow path → push to a queue, return early
  • Cache cold-start → warm-up scripts, dual-tier cache

⚡ Edge-case-style deep dives

  • Twitter: celebrity user with 100M followers — fan-out at write-time would write 100M times per tweet
  • Uber: SXSW geographic hot spot — geo-shard rebalances
  • URL shortener: viral link → CDN absorbs the spike
  • Chat: phone-loses-signal → message ordering & replay
  • Payments: double-submit on flaky network → idempotency keys

Twitter celebrity problem — a worked example

The candidate has just drawn a fan-out-on-write architecture: when Sarah tweets, the worker pushes her tweet into all 200 of her followers' feeds. Then the interviewer asks: "What happens when Taylor Swift tweets?"

Bad answer: "Uh, we'd just fan out to all her followers."
Good answer: "Pure fan-out-on-write breaks for celebrity users — Taylor has 95M followers, so one tweet would be 95M writes. Two fixes: (1) hybrid model — for users with <100K followers we fan out on write; for users above that threshold we fan out on read (the reader's feed-build query unions the precomputed feed with on-the-fly fetches from the celebrity's tweet timeline). (2) fan-out budget — the worker still does write fan-out for active followers, and lazy fan-out for the long-tail inactive followers, so dormant accounts don't bloat the celeb's write cost."

Crucial — leave room for the interviewer to drive

The interviewer has a list of probes they want to test on every candidate. Maybe it's "ask about hot keys", "ask about consistency in the cache", "ask about how the schema handles deletions". If you spend 15 minutes monologuing your way through three deep dives they didn't pick, you fail their unstated checklist. The move: do one deep dive of your own choosing, then explicitly hand the steering wheel: "I could go deeper on the cache, or talk about partitioning, or address how we handle deletes — what's most useful?"

So what: Deep dives are a dialogue, not a lecture. Pick one to demonstrate breadth, hand the wheel to the interviewer, and follow their probes. The candidate who treats Step 5 as "I'll keep monologuing until time runs out" loses to the one who treats it as "now we co-design the hard parts."
Anti-patterns

Common Mistakes — The Red Flags Interviewers Watch For

Every interviewer has a mental list of behaviors that signal "this candidate isn't ready." None of them are about technical knowledge — they're all process and communication failures. Avoiding them is half the loop.

🚨 Diving to architecture before requirements

Candidate hears "design Twitter", immediately draws Web Server → DB → Cache. The interviewer hasn't even said what the user can do yet. Without requirements, every architectural choice is an assumption — and you'll be defending phantom decisions for the rest of the loop.

Fix: first words out of your mouth are "let me make sure I understand the problem — what are the top user actions you'd like me to focus on?"

🚨 Vague NFRs — "highly available, scalable, fast"

These are adjectives, not requirements. They don't constrain any decision because every system can claim them. The interviewer can't push back ("how scalable?") and so can't grade your reasoning.

Fix: attach a number or a CAP-side to each. "Available — 99.99% on reads, 99.9% on writes. Reads can serve stale data, writes must be durable."

🚨 Generic tech choices — "we'll use Redis"

Redis what? A string? A hash? A sorted set? With what eviction policy? Sharded how? "Use Redis" is a meaningless phrase — it's like saying "we'll use a computer." Every storage choice has a data structure and an access pattern; name them.

Fix: "Cache the feed in Redis as a sorted set keyed by user_id, score = tweet timestamp, capped at 800 entries per user, evicted via LRU at the cache-node level."

🚨 False SQL vs NoSQL choice

"NoSQL because it scales" is a 2014 answer that interviewers now actively flag. Modern Postgres scales to terabytes; modern DynamoDB has transactions. The choice depends on access pattern (key-value vs relational queries), consistency needs, and operational maturity — not on a one-line slogan.

Fix: "Key-value access pattern at billion-row scale with no joins → DynamoDB. If we needed multi-row transactions and complex reporting, I'd revisit Postgres."

🚨 Boxes that don't earn their place

Candidate draws Kafka because "you always need a queue." But there's no async work in the design — every endpoint is request/response. The Kafka box now needs explanation, takes board space, and signals you cargo-cult components without justifying them.

Fix: for every box, be ready to answer "what would break without this?" If the answer is "nothing", erase it.

🚨 Talking over the interviewer's probes

Interviewer says "what about hot keys?" and the candidate, mid-monologue, says "yeah, I'll get to that, but first let me explain the cache topology…" — and then never gets to it. Probes are gifts. They tell you exactly what the interviewer wants to hear.

Fix: when probed, stop, address the probe directly, then return to your thread. "Good question — let me handle hot keys now and come back to the topology."

The Senior Tell

Specifying Implementations — The Single Biggest Tell

Of all the things that separate a junior-leveled answer from a senior-leveled one, the single biggest is specificity. Junior candidates name a tool; senior candidates name the tool, the data structure inside it, the access pattern, the partitioning, the failure mode, and the fallback. Same five words, ten times the signal.

🟥 "I'll cache popular tweets"

This is a wish, not a design. Where? In what data structure? Keyed by what? Evicted how? Sharded across how many nodes? Read-through or write-through?

🟩 The senior version

"I'll store tweet_id → Tweet as a Redis HASH on a 12-node cluster sharded by tweet_id via consistent hashing. 256GB per node, LRU eviction, replicated 2× for fault tolerance. Read-through from the app layer with a 60-second negative cache for missing keys to avoid stampede on deleted tweets."

🟥 "Use a queue for async work"

Which queue? What's the throughput? What partitioning gives you ordering guarantees? What happens when a consumer crashes mid-message?

🟩 The senior version

"Kafka topic notifications.email, 32 partitions keyed by user_id so per-user notifications stay ordered. 3-replica fault tolerance with min.insync.replicas=2 for durability. 7-day retention so we can replay if a downstream consumer regresses. Consumer group with at-least-once semantics; the email-sender is idempotent on (user_id, notification_id)."

🟥 "Shard the database"

By what key? Range or hash? How many shards? What happens when you add a shard? Cross-shard queries?

🟩 The senior version

"Shard the Tweets table by user_id using consistent hashing on a 16-virtual-node ring across 8 physical shards, replicated 3× across AZs. user_id as the shard key co-locates all of one user's tweets, which makes the timeline-by-user query a single-shard read. Cross-shard 'global timeline' queries scatter-gather, which is acceptable because the home feed isn't built that way — it's precomputed from the fan-out path."

The pattern in one sentence: for every component you name, also name (1) the specific product, (2) the data structure or schema choice inside it, (3) the partitioning / sharding strategy, (4) the replication and fault model, and (5) what the system does when this component degrades. That's the senior signal — and it costs maybe 15 extra seconds per component.
Worked Example

The Walkthrough Script — Designing Bitly in 45 Minutes

Below is a minute-by-minute script of an idealized candidate applying the framework to "Design Bitly." Read it as the tempo and tone you should aim for — concise, structured, narrating the framework out loud as you go.

sequenceDiagram actor C as Candidate actor I as Interviewer Note over C,I: Minute 0 — Prompt I->>C: Design Bitly. Note over C,I: Minutes 0 to 5 — Step 1 Requirements C->>I: Top 3 features shorten, redirect, custom alias? I-->>C: Yes. Skip analytics for now. C->>I: NFRs 20K reads per sec, p99 under 100ms, 5 yr storage, never lose a mapping I-->>C: Sounds good. Note over C,I: Minutes 5 to 7 — Step 2 Entities C->>I: Three entities ShortLink, User, ClickEvent. ClickEvent dropped per your call. Note over C,I: Minutes 7 to 12 — Step 3 APIs C->>I: POST /v1/urls body has long_url and optional alias. GET /:hash returns 302. I-->>C: Why 302 not 301? C->>I: Caches at the browser. 301 kills click counts. Note over C,I: Minutes 12 to 25 — Step 4 HLD C->>I: Pass 1 one app, one DB. Breaks at 20K reads per sec and 15TB storage. C->>I: Pass 2 split read path from write path, add KGS for collision-free keys. C->>I: Pass 3 add Memcached for hot 20 percent, CDN for global edge, Cassandra sharded by hash. Note over C,I: Minutes 25 to 40 — Step 5 Deep Dives I->>C: What if a link goes viral with 100K reads per sec? C->>I: CDN absorbs at edge. Origin sees cache hit. Telemetry async via Kafka. I->>C: How do you generate keys without collisions? C->>I: KGS pre-generates 6-char base64 into an unused pool, atomic POP per write. I->>C: What if KGS dies? C->>I: Active-standby pair plus 100-key local cache per write app survives a 30s outage. Note over C,I: Minutes 40 to 45 — Wrap I->>C: Any questions for me? C->>I: How does this compare to your real Bitly architecture?

The same script, with timing annotations

00:00
Interviewer: "Design Bitly."
00:30
Candidate: "Got it. Before I draw anything, let me pin down requirements. The top 3 functional features I'd focus on: (1) given a long URL, generate a short one; (2) when someone hits the short URL, redirect them; (3) optional custom alias. Sound right?"
01:00
Interviewer: "Yes. Don't worry about analytics."
01:15
Candidate: "Non-functional: read-heavy at roughly 100:1, target 20K reads/sec, p99 under 100ms. Durability is critical — once we issue a short link we cannot lose the mapping. Available 99.99% on the read path; brief degradation on writes is OK. Storage 5-year horizon."
05:00
Candidate (Step 2): "Three core entities: ShortLink (hash, long_url, expires_at), User, ClickEvent. We're skipping ClickEvent since you said no analytics."
07:00
Candidate (Step 3): "API surface — POST /v1/urls with the long URL in the body, returns the short URL. GET /:hash returns a 302 redirect. user_id derived from auth token, never from the body."
09:00
Interviewer: "Why 302 not 301?"
09:15
Candidate: "301 lets the browser cache the redirect target permanently — every later click skips our server. 302 forces re-asks. We pay ~10ms per click for accurate metrics. Bitly does this."
12:00
Candidate (Step 4 — Pass 1): [draws Client → App → MySQL] "Simplest thing — one app server, one DB. Three failures: hash collisions cost retries, 20K reads/sec crushes one MySQL, and 15TB doesn't fit on one box."
16:00
Candidate (Step 4 — Pass 2): "Split write path from read path — write traffic is 200 req/sec, read is 20K. Different scaling shapes. Add a Key Generation Service that pre-generates unique 6-char base64 keys into an 'unused' pool, so write-time has zero collision retries."
21:00
Candidate (Step 4 — Pass 3): "Production shape — Client → CDN → LB → split into Write App or Read App. Read App has Memcached in front for the hot 20% (~170GB). URL DB is Cassandra sharded by hash via consistent hashing, 3× replicated across AZs."
25:00
Candidate (Step 5): "I have a few directions for deep dive — I could go into the KGS internals, the cache replication, or the multi-region story. What's most interesting?"
25:30
Interviewer: "What if a link goes viral and gets 100K reads/sec?"
25:45
Candidate: "Viral = hot. The CDN at the edge absorbs nearly all of it; the origin sees one cache hit per cache TTL per edge node. If the link bypasses CDN — say it's freshly created — Memcached covers it. The DB shard for that hash never sees it. The only choke is CDN egress bandwidth, which is what CDNs are designed for. Telemetry is async via Kafka so the click counter doesn't melt."
29:00
Interviewer: "What if KGS dies?"
29:15
Candidate: "Three layers. Active-standby KGS pair fails over in seconds. Each write app caches ~100 keys locally — survives 30s of total outage. Last-resort fallback is inline hashing with collision-retry — slower but doesn't drop writes."
40:00
Candidate (Wrap): "If we had more time I'd cover global multi-region replication, abuse-prevention rate limiting, and the URL-deletion lifecycle. Want me to dive into any of those?"
42:00
Interviewer: "Great — any questions for me?"
What this script demonstrates: the candidate spent exactly 5 min on requirements, 2 min on entities, 5 min on APIs, 13 min on architecture, and 15 min on deep dives — and never went off-script when the interviewer probed. Notice especially the move at minute 25: instead of monologuing, the candidate offered three deep-dive options and let the interviewer steer. That's the senior signal.
Reference Map

Mapping This Site's HLDs to the Framework

Use this table as a study guide. Each existing HLD page on this site emphasizes different framework steps and showcases different patterns. When you're practicing, pick a page based on what step you want to drill.

HLD Page Framework Step Emphasized Patterns Showcased Best For Practicing
URL Shortener Step 4 (HLD) + Step 5 (KGS deep dive) Cache 80/20, KGS pre-generation, consistent hashing, CDN edge cache Read-heavy systems & key-generation problems
Dropbox Step 4 (control vs data plane split) Block-level dedup, metadata vs content split, sync protocol Storage systems & the "two-plane" mental model
Distributed UUID Step 4 + Step 5 (collision deep dive) Snowflake IDs, time-bit packing, clock skew handling ID generation & coordination-free design
LeetCode Step 4 (sandbox isolation) + Step 5 (queueing) Container sandboxing, judge queue, real-time results via WebSocket Compute-heavy systems & isolation patterns
Median of Billions Step 1 (problem decomposition) Approximate algorithms, t-digest, sketch data structures Algorithm-flavored HLDs & capacity reasoning

Recommended drill order

Week 1 — Foundations

Read the URL Shortener HLD end-to-end. Then close it and re-derive the architecture from just the requirements. Time yourself — aim for 45 minutes.

Week 2 — Patterns

Drill Dropbox (control/data plane split) and Distributed UUID (coordination-free design). Both teach mental models that transfer across many systems.

Week 3 — Edge cases

LeetCode and Median of Billions force you outside the standard CRUD pattern. Use them to practice Step 5 — handling unusual workloads under realistic constraints.

The meta-lesson: every HLD on this site was structured using the same 4-step framework you just learned. When you read them, watch for the framework underneath — Requirements first, then Entities, then APIs, then HLD, then Deep Dives. Once you see the skeleton, you can apply it to any system the interviewer throws at you.