← Back to Design & Development
HLD Reference · Patterns Catalog

The 8 Common Patterns Every HLD Uses

A reference catalog of the eight patterns that show up in nearly every system design interview — when to use each, what problem it solves, and concrete examples from real systems on this site.

Step 1

Why Patterns? — Vocabulary Signals Seniority

Picture the interview room. A junior engineer is asked to design Twitter's timeline. She thinks for a moment, then starts drawing: "OK so when Sarah tweets, we need to put it in front of her followers. Maybe each follower has an inbox? And we copy the tweet into every follower's inbox at write time?" She is, on the spot, painstakingly reinventing fan-out-on-write — taking five minutes to derive what has been a named technique for fifteen years.

Across the table, a senior engineer nods. He's not nodding because the answer is brilliant — he's nodding because he's heard the same derivation a hundred times. If she had just said "I'd use fan-out-on-write for normal users and fan-out-on-read for celebrities", he would have moved on in thirty seconds.

That is what patterns are for. They are vocabulary that compresses a five-minute explanation into a thirty-second one. Naming a pattern lets you skip the derivation and spend interview time on the thing that actually matters: which pattern fits this problem, what the trade-offs are, and how you'd wire it into the rest of the system.

The unfair advantage of pattern fluency: if you know that pushing real-time updates is a named problem with three named answers (polling, SSE, WebSockets) and you can rattle off when each wins, the interviewer instantly puts you in the "senior" bucket. If you re-derive WebSockets from scratch, you're in the "smart but green" bucket. Same answer, different label, very different offer.

This page catalogs the eight patterns that come up in nearly every HLD interview. Each section gives you: the scenario where it applies, the problem it solves, the canonical architecture, the trade-offs, a decision tree, and links to two or three real systems on this site that use it. Treat it as a lookup table — when an interviewer says "users edit a doc together in real time", you should immediately think Pattern 1: Real-time Updates, and have three candidate architectures ready before they finish the sentence.

Step 2

The 8 Patterns at a Glance

A bird's-eye view first. Each row is one pattern; the rest of this page is a deep dive on each one. The mindmap below is the same content visually — useful for spotting which family a problem falls into.

mindmap root(("HLD Patterns")) Real-time WebSockets SSE Polling Long-Running Async jobs Worker queues 202 + job_id Contention Pessimistic lock Optimistic CAS Distributed lock Scaling Reads Indexes Replicas Cache CDN Scaling Writes Sharding Partitioning Write queues Large Blobs Presigned URLs Object store Metadata DB Multi-Step Orchestration Choreography Saga compensations Proximity Geohash QuadTree Redis GEO
#PatternOne-line descriptionJump to
1Real-time UpdatesPush state changes to clients within seconds without polling§3
2Long-Running TasksDon't block the HTTP request for a 30-second job§4
3ContentionStop two users from grabbing the same seat / coupon / row§5
4Scaling ReadsSurvive a 100:1 read-to-write ratio without melting the DB§6
5Scaling WritesSpread write load past what one box can absorb§7
6Large BlobsGet gigabyte uploads off your app servers§8
7Multi-Step / SagasRoll back a 4-step workflow when step 3 fails§9
8ProximityAnswer "find nearby X" without scanning every entity§10
Reading order tip: if you're new, read 1 → 8 in order — they're sequenced from most-frequent to most-specialized. If you're prepping for a specific interview, jump to the patterns that match your target system: ride-sharing leans on 1+5+8, video sites lean on 2+4+6, ticketing leans on 3+7.
Pattern 1

Pushing Real-time Updates

Picture two people. Sarah is editing a Google Doc on her laptop. Raj, sitting in another country, is reading the same doc. Sarah types a sentence. Raj's screen needs to show that sentence within a second or two — without Raj's browser hammering the server every 200ms asking "anything new yet?". That is the problem this pattern exists to solve.

When to use it

Chat apps (Messenger, Slack, WhatsApp), live notifications, real-time dashboards, collaborative editors (Docs, Figma), live driver tracking on a map, live sports scores, multiplayer game state, presence indicators. Any time you want a server-side change to appear on a client without the client asking.

The problem stated precisely

HTTP is request-response. The server can't talk to the client unless the client asks first. So the naive answer is "have the client ask every second" — polling. That works, technically, but it's spectacularly wasteful: with a million idle users polling every 5 seconds, you're handling 200K requests/sec for nothing 99% of the time. We need a way for the server to push a message to the client only when something actually happens.

The three canonical architectures

A. Polling / Long-Polling

Client repeatedly asks "anything new?" — either every N seconds (short poll) or holding the request open until the server has news (long poll).

Pros: dead-simple HTTP, works behind any firewall, no special protocol.

Cons: wasteful at idle, latency = poll interval, server holds many open connections in long-poll mode.

When it wins: updates are rare (once per minute), or the client is short-lived (a checkout flow that polls payment status for 30 seconds).

B. Server-Sent Events (SSE)

Client opens one HTTP connection; server streams text events down it indefinitely. One-way: server-to-client only.

Pros: standard HTTP, works through proxies, auto-reconnect built into the browser, dead-simple to implement.

Cons: one-way only — client cannot send messages back on the same connection. Browser limit of ~6 concurrent SSE per origin.

When it wins: server pushes notifications, dashboards, live feeds — anywhere the client doesn't need to talk back in real time.

C. WebSockets

HTTP upgrades to a persistent full-duplex TCP connection. Either side can send a message at any time.

Pros: truly bidirectional, lowest latency, lightest per-message overhead.

Cons: not pure HTTP — some corporate proxies block it, sticky-session load balancing required, more operational complexity.

When it wins: chat, multiplayer games, collaborative editing — anywhere the client needs to send messages too.

Decision tree — which transport?

flowchart TD Q1{"Client needs to push to server in real time?"} Q2{"Updates frequent enough that polling wastes traffic?"} Q3{"Updates rarer than once per minute?"} POLL["Polling — short or long poll"] SSE["Server-Sent Events"] WS["WebSockets"] Q1 -- "yes" --> WS Q1 -- "no" --> Q2 Q2 -- "yes" --> SSE Q2 -- "no" --> Q3 Q3 -- "yes" --> POLL Q3 -- "no" --> SSE style WS fill:#171d27,stroke:#38b265,color:#d4dae5 style SSE fill:#171d27,stroke:#4a90d9,color:#d4dae5 style POLL fill:#171d27,stroke:#d4a838,color:#d4dae5

The fan-out backbone

Whichever transport you pick, behind it sits a pub/sub backbone that delivers messages to the right connection. When Sarah types, her server publishes to a topic like doc:42; every connection subscribed to doc:42 (Raj's WebSocket) receives the event. Common backbones: Redis Pub/Sub (simple, ephemeral), Kafka (durable, replay), or a managed service like Ably / Pusher / AWS API Gateway WebSockets.

Trade-offs & gotchas

⚖️ Connection state

WebSockets and SSE are stateful. The server knows "Raj's connection is on pod-7". If pod-7 crashes, Raj's connection drops. You need sticky-session load balancing or a connection registry (Redis with {user_id → pod}) so other pods can find Raj when they need to push to him.

⚖️ Backpressure

If you're publishing 100 events/sec to a slow client, your pub/sub queue grows unbounded. Solutions: drop old events (live driver location — only the latest matters), batch events, or close the connection.

Live examples on this site

💬 Facebook Messenger

WebSockets for bidirectional chat. Connection registry maps user_id → chat_server_pod. See the full design →

🚗 Uber

Driver location pushed to the rider's app every few seconds via WebSockets. The map dot moving smoothly is this pattern in action. See the full design →

🐦 Twitter timeline

New-tweet notifications pushed to active users via WebSockets — "5 new tweets" appears without a refresh. See the full design →

Pattern 2

Managing Long-Running Tasks

Picture this. Priya uploads a 4K wedding video to YouTube. The transcoding pipeline needs to produce 1080p, 720p, 480p, and 360p variants — about three minutes of work for a ten-minute video. Should her browser sit there spinning for three minutes waiting for the HTTP response? Of course not. The browser will time out, the load balancer will time out, and her experience will be terrible. We need a different shape.

When to use it

Video transcoding, large report generation, bulk imports/exports, ML model training, sending a million emails, image processing pipelines, data exports. Anything where the work to be done is far longer than a reasonable HTTP response.

The pattern

Three steps. First, the API accepts the work and immediately returns 202 Accepted with a job_id — a receipt. Second, the actual work goes onto a queue (Kafka, SQS, RabbitMQ) where a pool of workers picks it up and processes asynchronously. Third, the client either polls the job status endpoint (GET /jobs/{id}) or subscribes via Pattern 1 to a push notification when the job completes.

sequenceDiagram participant C as Client participant API as API Server participant Q as Queue — Kafka / SQS participant W as Worker Pool participant DB as Job Status DB C->>API: POST /jobs (long task payload) API->>DB: INSERT job_id status=PENDING API->>Q: enqueue job_id API-->>C: 202 Accepted job_id=abc Note over Q,W: Async — request is done W->>Q: dequeue job_id=abc W->>W: process (3 minutes) W->>DB: UPDATE job_id status=COMPLETED C->>API: GET /jobs/abc API->>DB: SELECT status API-->>C: status=COMPLETED, result_url=...

Trade-offs

✅ Synchronous request-response

When it wins: task completes in under 1-2 seconds. Simpler code, instant feedback, no queue infrastructure.

When it loses: task is longer than the HTTP timeout (typically 30-60s). Browser disconnects; user retries; you do the work twice.

✅ Asynchronous job pattern

When it wins: task is long, variable, or expensive. Workers scale independently of API tier. Spike absorption — 10K uploads in a minute land in the queue and drain over an hour.

When it loses: for short tasks, the polling overhead and status-tracking complexity isn't worth it.

The hidden requirement — idempotency

Workers crash. Queues retry. Your job will get processed twice sometimes. Every long-running task must be idempotent — running it twice produces the same result as running it once. Store the job_id on the output, check before doing work, and you're safe. Skipping this step is the most common production bug in this pattern.

Status-delivery — poll or push?

Two ways for the client to learn the job is done. Polling the status endpoint is simple and works everywhere — fine if the job takes seconds to a few minutes. Push notification via Pattern 1 (WebSocket / SSE / webhook) is better for very long jobs (no wasted polls) or when the user has navigated away. Big systems offer both.

The interview move: when an interviewer hints at a long-running operation ("user generates a report", "user uploads a video", "user runs a backtest"), immediately propose the async-job pattern and draw the queue. They are watching to see if you reach for it without prompting.

Live examples on this site

📺 YouTube transcoding

Upload returns 202 with video_id. A pipeline of workers runs ffmpeg for each resolution. User sees a progress bar driven by status polling. See the full design →

🔔 Notification Service

Sending a campaign to 10M users — API enqueues, workers fan out and send via SES/Twilio. See the full design →

⏰ Job Scheduler

The canonical case — schedule and run jobs by their kind, retry on failure, observe progress. See the full design →

Pattern 3

Dealing with Contention

Picture Taylor Swift's Eras Tour going on sale. At 10:00:00 sharp, two million fans hit "Buy" at the same instant. There are 50,000 seats. Every seat must end up assigned to exactly one fan — not zero, not two. If two fans both walk away thinking they bought seat A-12, that's a Ticketmaster lawsuit. This is the contention problem in its purest form.

When to use it

Ticket booking, flash sales, auctions, inventory decrement, distributed counters that must not double-count, "claim a username", "redeem a coupon", any time multiple actors race for a finite resource.

The three canonical solutions

A. Pessimistic locking

SELECT seat_42 FOR UPDATE — the DB takes a row lock; nobody else can read or write that row until you commit. Whoever gets there first wins; everyone else waits in line.

Pros: simple, correct, no retry logic.

Cons: slow under heavy contention — every loser waits. Lock-holder crash = stuck row until timeout. Can deadlock.

B. Optimistic concurrency (CAS)

No lock taken upfront. Read row + version. Update with WHERE version = old_version. If 0 rows updated → someone else won → retry from scratch.

Pros: fast happy path, no lock contention, no deadlocks.

Cons: retry storms under heavy contention. If 1000 users race for one seat, 999 retry, then 998, then 997… O(N²).

C. Distributed locks / queues

Redis Redlock or ZooKeeper for cross-DB locks. Or serialize all writes for a resource through one queue partition — only one consumer at a time → no contention by construction.

Pros: works across services and DBs. Queue serialization is bulletproof.

Cons: queue caps throughput at one-writer speed. Distributed locks have correctness gotchas (clock skew, fencing tokens).

Decision tree

ScenarioPickWhy
Single relational DB, low-medium contentionPessimistic SELECT FOR UPDATESimplest correct answer; DB does the heavy lifting
Single DB, high contention but rare collisionsOptimistic CASHappy path is lock-free; retries handle the rare conflict
Multi-DB / multi-service resourceDistributed lock (Redis Redlock)Only way to coordinate across systems
Strict order matters (FIFO booking)Queue with single-consumer per partitionOrder is guaranteed by the queue itself
Counter that must not double-countAtomic increment (INCR in Redis)Single op, race-free by definition

The reservation pattern — most ticketing systems

Ticket booking specifically uses a two-stage variant: reserve the seat for 10 minutes (locking it from others), then confirm when payment succeeds, or release on payment failure or timeout. The reservation is itself a CAS operation; the timeout protects against abandoned carts holding seats hostage.

The classic interview trap: "How do you ensure two users don't book the same seat?" Don't say "transactions" and stop there — that's the answer for one DB, but the interviewer wants to hear "and what if the seat lookup is in cache and the booking is in DB? what if payment is a third service?". The right answer mentions the reservation pattern, the timeout, and the saga (Pattern 7) for cross-service rollback.

Live examples on this site

🎟️ Ticketmaster

Reservation pattern with 10-minute holds, optimistic concurrency on the seat row, distributed lock for cross-region inventory. See the full design →

💳 Payment System

Idempotency keys + optimistic CAS on the wallet balance. Saga rollback on downstream failure. See the full design →

Pattern 4

Scaling Reads

Picture a Twitter timeline being loaded by 100 million daily users. Each user opens the app five times a day. Each open hits the timeline endpoint. That's 500 million reads a day, peaking at maybe 30,000 reads per second at lunchtime in the US. Now picture the writes — at most a few thousand tweets per second. The ratio is brutal: about 100 reads for every write. This shape is so common it's the default assumption — most systems are read-heavy.

When to use it

Pretty much always. If your read-to-write ratio is above 10:1, you need this pattern. Social feeds, e-commerce product pages, news, search results, anything user-facing. The only systems that don't need it are pure write-heavy ones like log ingestion or telemetry.

The five tools, in order of cost and impact

1 Indexes

The cheapest first move. A missing index turns a 1ms lookup into a 1-second table scan. Add an index on every column you filter or sort by — primary key, foreign keys, common WHERE clauses. Cost: writes get ~10% slower per index; storage grows. Impact: the difference between a working DB and a melted one.

2 Denormalization

Stop joining at read time — store the joined view directly. Instead of orders JOIN users JOIN products on every page load, pre-compute an order_summary row with all three flattened. Cost: writes get more expensive (must update multiple places); risk of drift. Impact: turns 5-table joins into single-row reads.

3 Read replicas

Keep one primary for writes; spin up 3-5 read-only replicas asynchronously. Route reads to replicas, writes to primary. Cost: replication lag (replicas are seconds behind); read-after-write inconsistency must be handled. Impact: linear read scaling — 5 replicas = 5× read capacity.

4 Cache (Redis / Memcached)

An in-memory layer in front of the DB. The hot 20% of data lives in RAM and serves 80% of reads. Cost: invalidation is genuinely hard ("there are only two hard things…"); stale reads possible. Impact: 100× latency improvement on cache hits, frees the DB for the cold tail.

5 CDN

Cache at the edge — POPs in 200+ cities. User in Tokyo gets a response from a Tokyo node, not your US data center. Best for static or semi-static content (images, videos, public profile pages). Cost: invalidation lag (purges take seconds-minutes); per-byte egress cost. Impact: kills round-trip latency globally.

+ Honorable mention — materialized views

Pre-compute expensive aggregations and store them as a table. "Top 100 trending products" is computed every 5 minutes, not on every request. Halfway between denormalization and a cache. Use: dashboards, leaderboards, anything aggregated.

The hard parts

🔥 Cache invalidation

Phil Karlton's joke: "There are only two hard things in CS — naming things, and cache invalidation." When the source of truth changes, the cache holds stale data. Strategies: TTL (eventual freshness, simple), write-through (consistent, slow writes), write-behind (fast, eventual), explicit invalidation (precise, requires plumbing).

⏱️ Replica lag

You wrote, then immediately read — but you read from a replica that hasn't caught up. The user updates their profile and refreshes — old name shows. Mitigations: read-your-writes flag (route the writer to primary for a few seconds), session stickiness, accept eventual consistency for non-user-facing reads.

🔥 Hot keys

One celebrity's profile gets 100K reads/sec. Even cached, one Redis node can't serve that. Solution: replicate hot keys across multiple cache nodes; or shard by request_id mod N for read distribution; or push hot content to CDN.

Live examples on this site

🔗 URL Shortener

170GB Memcached fronts the URL DB; CDN caches viral redirects at the edge. 80% of reads never touch the DB. See the full design →

🐦 Twitter

Pre-computed timeline cache per user (denormalization at write time). Hot tweets served from in-memory cache. See the full design →

📷 Instagram

Multi-tier cache: PostgreSQL → Memcached → CDN for photos. Profile pages denormalized for one-row reads. See the full design →

Pattern 5

Scaling Writes

Picture WhatsApp. A billion users send messages constantly — at peak load, the system absorbs millions of writes per second. A single Postgres instance maxes out around 10,000 write transactions per second. Even the fattest box on AWS tops out around 50,000. So when your write throughput exceeds what one machine can absorb, you cannot fix it with more cache or more replicas — those scale reads, not writes. You have to spread the writes themselves.

When to use it

Message ingest, telemetry collection, write-heavy timelines (Twitter), high-volume order systems, IoT data, location updates, anything where new rows arrive faster than one machine can write them.

The four levers

🔪 Horizontal sharding (the main lever)

Split the data across N machines. Each row goes to exactly one shard, chosen by hashing a partition key. shard = hash(user_id) mod N. With 16 shards, write throughput is roughly 16× single-box. Sharding is the thing that turns "10K writes/sec" into "millions of writes/sec".

↕️ Vertical partitioning

Split by data type, not by row. Put user_profile on one DB, user_posts on another, user_messages on a third. Each can scale independently. Common as a stepping stone before horizontal sharding.

📥 Write queues

Put a Kafka or Kinesis between your API and your DB. Spike of 100K writes/sec? They land in the queue and drain to the DB at a sustainable rate. Acts as a shock absorber. Pairs with batching: dequeue 1000 writes, do one bulk INSERT.

📦 Batch writes

One transaction inserting 1000 rows is dramatically cheaper than 1000 transactions inserting one row each. Batch on the client, the queue, or the API. Also write-coalescing: if the same user posts 5 location updates in a second, write only the latest.

Choosing the partition key — the most important decision

Picking the wrong partition key cripples the system in ways no amount of more shards can fix. Three rules:

1. Distribute uniformly

If 90% of writes have the same key, 90% of writes hit one shard. Shard by something with high cardinality and uniform distribution: user_id for user-scoped data, hash of entity_id for events.

2. Match the query pattern

You'll mostly query by partition key. If you shard messages by user_id but your most common query is "give me the latest 100 messages globally", you scan every shard. Pick the key your reads look like.

3. Avoid hot keys

Sharding Twitter by user_id means @elonmusk's shard handles 10× the load of others. Mitigations: sub-shard celebrity rows, write to a queue and fan out, or shard by (user_id, time_bucket).

The IDs trick — Snowflake

Twitter's Snowflake ID embeds (timestamp + worker_id + sequence) into a 64-bit ID. This gives you three things at once: globally unique IDs generated independently on every server (no central counter); time-sortable (sort by ID = sort by creation time); natural sharding key (you can shard by ID and still range-scan recent items by mod-bucket). Most modern write-heavy systems use a Snowflake-style ID scheme.

The trade-offs you must accept

⚠️ Cross-shard joins are gone

You cannot JOIN users (shard A) ON orders (shard B). Either denormalize (Pattern 4), do the join in the application layer, or pick a partition key that co-locates the data.

⚠️ Rebalancing is painful

With plain hash mod N, adding a shard means N→N+1 and every key moves. Always use consistent hashing instead — adding a shard moves only 1/(N+1) of keys.

Live examples on this site

🐦 Twitter

Tweets sharded by Snowflake tweet_id; user-scoped data sharded by user_id. Celebrities get special-cased. See the full design →

🕷️ Web Crawler

URL frontier sharded by hostname so politeness rules — "max 1 RPS per domain" — stay co-located on one worker. See the full design →

💬 Messenger

Message store sharded by conversation_id; per-user inbox indexed separately. See the full design →

Pattern 6

Handling Large Blobs

Picture this. Mark uploads a 4GB raw video to YouTube. The naive design — POST it to your API server, which then writes it to storage — has a fatal flaw. Every byte of those 4GB has to flow through your app server. If 100 users upload at the same time, your app servers melt. Worse, your bandwidth bill triples because the bytes travel client → app → S3 instead of client → S3 directly. Your app servers, which should be doing CPU-light routing work, become file-transfer middleware.

When to use it

Photo uploads, video uploads, file sharing, document storage, anywhere users upload payloads bigger than a few MB. Anything Dropbox, YouTube, or Instagram-shaped.

The pattern — presigned URLs

The trick is to take the bytes out of the request path entirely. The client uploads directly to S3 (or GCS / Azure Blob), bypassing your app server. To make this safe — you don't want anonymous internet uploaders writing to your bucket — the app server issues a presigned URL: a temporary, signed URL that grants the client permission to PUT one object to one specific path for the next N minutes.

sequenceDiagram participant C as Client participant API as API Server participant S3 as Object Store — S3 participant DB as Metadata DB C->>API: POST /uploads — request upload slot API->>DB: INSERT placeholder file_id status=PENDING API->>S3: generate presigned PUT URL valid for 15 min API-->>C: file_id + presigned_url C->>S3: PUT bytes — directly, app server uninvolved S3-->>C: 200 OK S3->>API: webhook — object created at path API->>DB: UPDATE file_id status=READY Note over C,DB: Subsequent reads — client GETs metadata, then signed download URL

The two-store architecture — blobs + metadata

Object stores are great at storing bytes. They are terrible as your primary database — no efficient queries, no indexes, no relations. The pattern is always: object store for bytes + traditional DB (Postgres / DynamoDB) for metadata. The metadata row holds {file_id, owner, name, size, content_type, s3_path, status, uploaded_at}. Queries like "show me Sarah's photos from last week" run against the metadata DB; the actual bytes only get fetched when the user clicks a thumbnail.

The webhook problem — keeping metadata in sync

The client uploads to S3, and your DB doesn't automatically know it succeeded. Two options. S3 event notifications — S3 sends a webhook (via SNS / EventBridge / Lambda) to your API when the object lands; your API flips the status to READY. Client-callback pattern — after the PUT succeeds, the client itself calls POST /uploads/{file_id}/complete. The webhook approach is more reliable (the client can crash; S3 won't). The callback approach is simpler.

Trade-offs

✅ Wins

  • App servers stay tiny — no bytes flow through them
  • S3 multi-part upload handles resumable, parallel chunking
  • Bandwidth cost halved — bytes go client → S3 once, not client → app → S3
  • S3 handles the durability story (11 9s) for you

⚠️ Costs

  • CORS configuration on the bucket (browsers won't PUT cross-origin without it)
  • Eventual consistency between blob and metadata (there's a window where bytes exist but DB says PENDING)
  • Cleanup of abandoned uploads (user requested URL but never PUT) — TTL'd lifecycle policy on the bucket
Critical anti-pattern: never store the binary blob inside your relational DB as a BLOB column. It bloats the table, kills backup performance, and makes every read slow. Always: bytes in object store, pointer in DB.

Live examples on this site

📦 Dropbox

Block store (S3) + metadata DB (sharded Postgres). Files chunked client-side. See →

📷 Instagram

Photos in S3, metadata in Cassandra, CDN for delivery. Original + multiple variants. See →

📺 YouTube

Raw video presigned upload to S3, metadata in Vitess MySQL. Transcoding pipeline reads from S3. See →

📋 Pastebin

Paste body in S3, metadata in Postgres. Even small text uses object store for cost. See →

Pattern 7

Multi-Step Processes & Sagas

Picture an Amazon order. Click "Buy" and four things must happen — reserve inventory, charge the credit card, create the order, ship it. Steps 1-3 are fast services owned by your team; step 4 is a third-party shipping API. Now imagine step 3 succeeds, step 4 fails. The credit card is charged, inventory is reserved, an order exists in the DB — but no shipment will ever happen. You cannot just throw an error to the user. You have to undo everything: refund the card, release the inventory, mark the order cancelled. That undo is what this pattern is about.

When to use it

Order fulfillment, payment workflows, multi-service transactions, any business process spanning more than one service or DB where partial failure is unacceptable.

Why ACID transactions don't work here

A database transaction (BEGIN ... COMMIT) gives you all-or-nothing across multiple rows in one database. But our four steps live in four different services with four different databases — payment service, inventory service, order service, shipping API. There is no global ACID. The classic solution, distributed two-phase commit (2PC), has been quietly abandoned by most modern systems because it's slow, fragile, and doesn't work across third-party APIs. The alternative is the saga pattern.

The saga idea — every step has a compensating action

For each forward step, define an undo. If step 4 fails, the system runs the undo for step 3, then 2, then 1 — in reverse — until everything that was done is undone. The system is eventually consistent: it's never in a half-done state for long.

sequenceDiagram participant O as Orchestrator participant Inv as Inventory participant Pay as Payment participant Ord as Order participant Ship as Shipping O->>Inv: reserve item — step 1 Inv-->>O: ok O->>Pay: charge card — step 2 Pay-->>O: ok O->>Ord: create order — step 3 Ord-->>O: ok O->>Ship: schedule shipment — step 4 Ship-->>O: FAIL Note over O,Ship: Forward path failed — run compensations in reverse O->>Ord: cancel order — undo step 3 O->>Pay: refund — undo step 2 O->>Inv: release item — undo step 1 Note over O: Eventually consistent again

Two flavors — orchestration vs. choreography

A. Orchestration — central engine

One service (the orchestrator) drives the saga. It calls each step in sequence, tracks state, and triggers compensations on failure. Tools: Temporal, AWS Step Functions, Cadence, Netflix Conductor.

Pros: centralized state machine, easy to observe, easy to reason about, easy to debug. The whole flow is one place in the code.

Cons: orchestrator is a coordination hub — if it goes down, all sagas pause. Tighter coupling between services and the orchestrator.

B. Choreography — event-driven

No central engine. Each service publishes events; downstream services subscribe and react. OrderCreated → payment listens → PaymentSucceeded → shipping listens → and so on. Failures publish failure events that compensating subscribers handle.

Pros: loosely coupled, no single point of coordination, services don't know about each other.

Cons: hard to debug — the flow is implicit in event subscriptions, not visible in any one file. Watching a saga end-to-end requires distributed tracing.

When to pick which

Orchestration wins when the flow is critical, the steps are owned by your team, and observability matters (payment, ticket booking, account opening). Choreography wins when services are owned by different teams, the flow is loose, and you want each team to evolve independently (a marketing event triggers six downstream effects across the company).

The hard parts of compensations

🪙 Some actions can't be undone

You sent an email. You can't un-send. The undo here is a "sorry, ignore that email" follow-up — semantic, not literal. Design with this in mind: place irreversible steps at the end of the saga so they never need compensating.

🔁 Compensations must be idempotent

If the orchestrator crashes during compensation and retries, you'd refund the card twice. Every undo must be safely re-runnable: keep an idempotency key, no-op if already compensated.

👀 Observability is non-negotiable

A stuck saga is a money-losing bug. You need a dashboard showing every in-flight saga, its current step, and how long it's been there. This is why orchestrators have ops UIs out of the box.

Live examples on this site

💳 Payment System

Multi-step charge flow with refund as compensation, idempotency keys, ledger entries for audit. See the full design →

🎟️ Ticketmaster

Reserve → Pay → Confirm saga. If pay fails, release the reservation. Hold timeout as built-in compensation. See the full design →

Pattern 8

Proximity-Based Services

Picture Maya opening Uber in Bangalore. She taps "Where to?" and within a second her screen lights up with eight driver dots within 2 km of her — out of the 50,000 drivers active in the city. The naive answer is "scan all 50,000 drivers, compute distance to Maya, return the eight closest". That works at 50,000. It does not work when Uber has 5 million drivers active globally and 10,000 Mayas tapping every second. We need a way to answer "what's near here?" in O(log N) instead of O(N).

When to use it

Ride-sharing (find nearby drivers), food delivery (nearby restaurants), dating apps (nearby singles), Yelp (nearby businesses), AirTags (nearby devices), Pokemon Go (nearby spawns). Anywhere the query is "find entities within X meters of (lat, lon)".

The four canonical answers

A. Geohash

Encode (lat, lon) as a string where shared prefix = nearby. tdr1y and tdr1z are neighbors; tdr1y and 9q8yyk are far. Query is "find all rows whose geohash starts with my prefix". Backed by Redis GEO commands or a Postgres index.

Pros: simple, supported by Redis natively, easy to update (delete + re-insert).

Cons: uneven cell sizes near poles; "edge effect" — two points just across a cell boundary look distant by prefix. Needs neighbor-cell padding.

B. QuadTree

Recursively split space into four quadrants; subdivide any cell that holds too many entities. Sparse areas have one big leaf; dense areas (downtown) have many small leaves. Each leaf has roughly the same number of entities — uniform query cost.

Pros: uniform leaf density makes searches fast everywhere. Great for sparse-then-dense data.

Cons: updates are complex — a moving driver may cross cell boundaries, triggering tree rebalancing. Best for slowly-changing data.

C. R-Tree

Hierarchy of bounding rectangles. Query: "what rectangles overlap this query box?" then recurse. Best for shapes with extent (not just points) — restaurant delivery zones, geofences.

Pros: handles ranges and shapes natively. PostGIS, Elasticsearch use it under the hood.

Cons: heavier than geohash for plain "nearest point" queries.

D. In-memory hash + grid (Uber-style)

For frequently-moving entities — divide the world into fixed cells (e.g., S2 cells of ~100m); maintain {cell_id → set_of_drivers} in Redis. On driver location update: remove from old cell, add to new. On rider query: look up the rider's cell + 8 neighbors.

Pros: updates are O(1). Reads are O(neighbor cell count). Scales to millions of moving entities.

Cons: not durable on its own (pair with a backing store). Cell sizing matters — too small wastes lookups, too big returns too many candidates.

QuadTree visualized — why it's so good for spatial search

flowchart TB R["Root cell — whole world"] NW["NW quadrant"] NE["NE quadrant"] SW["SW quadrant"] SE["SE quadrant — dense, splits further"] R --> NW R --> NE R --> SW R --> SE SE --> SE1["SE-NW"] SE --> SE2["SE-NE — downtown, splits again"] SE --> SE3["SE-SW"] SE --> SE4["SE-SE"] SE2 --> SE2a["leaf — 8 entities"] SE2 --> SE2b["leaf — 7 entities"] SE2 --> SE2c["leaf — 9 entities"] SE2 --> SE2d["leaf — 6 entities"] style R fill:#171d27,stroke:#e8743b,color:#d4dae5 style SE fill:#171d27,stroke:#e05252,color:#d4dae5 style SE2 fill:#171d27,stroke:#e05252,color:#d4dae5 style NW fill:#171d27,stroke:#38b265,color:#d4dae5 style NE fill:#171d27,stroke:#38b265,color:#d4dae5 style SW fill:#171d27,stroke:#38b265,color:#d4dae5

The downtown SE quadrant has thousands of entities, so the tree splits it further. Sparse rural NW stays as one big cell. Every leaf holds roughly 8-10 entities — so a "find nearby" query always returns ~30 candidates regardless of where on Earth the user is. That's the magic.

Decision tree

ScenarioPick
Under 100K entities, slow updatesPostgres + bounding-box index (PostGIS)
100K — 100M entities, slow updates (restaurants, businesses)QuadTree
Frequent updates (drivers, scooters, deliveries)Geohash or grid + in-memory hash table
Range queries / shapes (geofences, delivery zones)R-Tree (PostGIS, Elasticsearch geo_shape)
Pre-built solution preferredRedis GEO commands — geohash-based, batteries included
The Uber lesson: Uber's drivers move continuously — a QuadTree would re-balance constantly. So Uber uses a hash-grid approach (DriverLocationHT) with fixed-size cells in memory; updates are O(1). Yelp's businesses don't move — Yelp uses a QuadTree because the structure is built once and queried millions of times. The right answer depends entirely on your update frequency.

Live examples on this site

📍 Yelp

QuadTree for ~200M businesses globally. Built offline, queried at scale. See the full design →

🚗 Uber

Geohash + in-memory DriverLocationHT for the 5M moving drivers. Driver pings every 4s. See the full design →

Pattern Combinator

How Real Systems Stack Patterns

Real systems never use just one pattern. The interesting design work is in composing them. Here's how the systems on this site combine the eight — notice how the same patterns recur, but the combinations are unique to each problem domain.

SystemPattern StackWhere each fits
Twitter P1 + P4 + P5 + P7 P1 — push new tweets to active timelines. P4 — pre-computed timeline cache. P5 — sharded by Snowflake tweet_id. P7 — fan-out write as a multi-step async process.
Uber P1 + P5 + P8 P1 — driver location pushed to rider. P5 — DriverLocationHT sharded by region. P8 — geohash + in-memory grid for nearby search.
Dropbox P4 + P6 + P7 P4 — metadata cache to avoid hitting DB on every folder open. P6 — presigned upload to block store. P7 — sync as a multi-step process across devices.
Ticketmaster P3 + P7 P3 — reservation pattern with optimistic CAS on the seat row. P7 — Reserve → Pay → Confirm saga with timeout-driven compensation.
YouTube P2 + P4 + P6 P2 — async transcoding pipeline. P4 — CDN + Memcached for hot videos. P6 — presigned upload of raw video to S3.
WhatsApp / Messenger P1 + P5 P1 — WebSockets for real-time message delivery. P5 — message store sharded by conversation_id.
Instagram P4 + P5 + P6 P4 — CDN for photo delivery, Memcached for feeds. P5 — sharded user/post stores. P6 — direct uploads to S3.
Yelp P4 + P8 P4 — heavy read caching for popular businesses. P8 — QuadTree for "near me" queries.
The interview move: when designing a new system, list the patterns that apply before you start drawing boxes. "This is read-heavy (P4), has uploads (P6), and needs real-time updates (P1)" gives you three pre-built mental models. Then your architecture is just composing those three with the specifics of the domain.
Anti-Patterns

What NOT to Do — Common Pattern Misuse

Patterns are tools. Like all tools, they hurt when misapplied. Here are the five reach-for-the-pattern reflexes that actually mark you as junior in an interview, not senior.

❌ "We'll just use Kafka"

Kafka is great when you have justified throughput, durability, and replay needs. Saying "Kafka" without specifying throughput, retention, partition count, consumer group strategy is a red flag. The interviewer will press: "Why Kafka over SQS? Why not just an in-memory queue? What's the retention?" If you can't answer, you reached for it as a buzzword. Fix: say "I'd use a queue here — Kafka if I need replay and high throughput, SQS if it's just job dispatch", and pick based on the requirement.

❌ Presigned URLs for 1KB JSON

Pattern 6 exists to keep big bytes off your app servers. Using it for tiny payloads adds latency (extra round trip to get the URL) and complexity (signed URL plumbing, S3 cost) for no gain. Fix: presigned URLs only for payloads above ~1MB. Below that, just POST directly.

❌ Sharding before proving the problem

"And now we shard by user_id" — said before establishing a single DB can't handle the load. Sharding is expensive in operational complexity, cross-shard joins, and rebalancing pain. A modern Postgres can do 50,000 writes/sec on one box. Fix: always start with "single box → read replicas → cache → vertical partition → horizontal shard", and shard only when the previous steps demonstrably aren't enough.

❌ Caching without invalidation strategy

"We'll add Redis in front of the DB" with no plan for staleness, eviction, or write coherency. The cache will silently serve old data, users will report bugs, and you'll spend a week debugging why prod and DB disagree. Fix: always specify TTL, eviction policy (LRU?), and invalidation trigger (write-through? explicit purge on update? event-driven?).

❌ Distributed lock for single-DB scenarios

Reaching for Redis Redlock when SELECT FOR UPDATE on one Postgres would have done it. Distributed locks have correctness gotchas (clock skew, fencing tokens, lock-holder GC pauses), and you don't need them if everything you're protecting lives in one DB. Fix: use the simplest lock that crosses the bounds of your data — DB lock if one DB, distributed lock only if you span DBs/services.

❌ Async everything

Pattern 2 exists for long jobs. Using async + queues for a 50ms operation just adds latency, complexity, and failure modes. Users now have to poll for results that could have been returned in the original request. Fix: sync by default; async only when the work exceeds ~1-2 seconds.

The unifying lesson: every pattern has a problem it solves. Naming the problem before the pattern is what separates senior from junior. "I have 10K writes/sec which exceeds single-box capacity, so I'll shard" is senior. "I'll shard" is junior. Make the cause-and-effect chain visible.
Cheat Sheet

Pattern Selection — Symptom-to-Pattern Lookup

The fastest reference. Match the symptom in column 1 to the pattern in column 2. Use this in the first 60 seconds of any interview when you're sketching the requirements — the patterns to apply should pop out immediately.

Symptom / RequirementPatternFirst-pass architecture
Need to push updates from server to clientP1 — Real-timeWebSocket if bidirectional, SSE if one-way, polling if rare
Operation takes longer than a few secondsP2 — Long-RunningAPI returns 202 + job_id, queue + worker pool
Concurrent edits to the same row / resourceP3 — ContentionOptimistic CAS for low contention, pessimistic for high, queue for FIFO
Read-to-write ratio above 10:1P4 — Scaling ReadsIndexes → replicas → cache → CDN, in that order
DB CPU pegged at 100% on writesP5 — Scaling WritesVertical partition first, then shard horizontally with consistent hashing
Users upload files larger than 1 MBP6 — Large BlobsPresigned URL to object store, metadata in DB, webhook to flip status
Workflow spans multiple services with rollbackP7 — SagasOrchestrator (Temporal/Step Functions) with compensating actions per step
"Find nearby X" query at scaleP8 — ProximityGeohash for moving entities, QuadTree for static, Redis GEO out of the box
Read-after-write consistency requiredP3 + P4Route writer to primary for N seconds; sticky session
Hot-key problem (one row gets 80% of traffic)P4 + P5Replicate hot keys across cache nodes; sub-shard the row
Idempotency required (charges, sends)P2 + P7Idempotency key on every request; check-then-act in worker
Need durable, replayable event logP2 + P7Kafka with retention; consumers can replay from offset
Spike absorption (10× traffic for an hour)P2 + P5Queue acts as buffer; workers drain at sustainable rate
Global low-latency readsP4CDN at edge + multi-region read replicas

The 30-second interview opening

When the interviewer states the problem, mentally check off each row of the table above. Within 30 seconds you should have a list like: "OK, this has uploads (P6), it's read-heavy (P4), and there's a workflow (P7) — three patterns to wire in". Then the rest of the interview is composing them, picking the partition key, debating cache invalidation strategy. You've already won the framing battle.

The one-line summary the interviewer remembers: "Patterns are vocabulary. Eight of them cover the structural backbone of nearly every HLD interview. Naming them lets you skip derivation and spend interview time on trade-offs and composition — which is where seniority is judged."