From "redirect 20K times a second" to a sharded, cached, KGS-backed system — the architecture, the trade-offs, and the production shape that earns every box
This deep-dive applies the 4-step HLD interview framework. As you read, map each section to Requirements → Entities → APIs → High-Level Design → Deep Dives, and notice which of the 8 common patterns and key technologies are at play.
Imagine Sarah is tweeting a link to a long blog article. The original URL is https://www.educative.io/collection/page/5668639101419520/5649050225344512/5668600916475904/ — 100+ characters that eat up half of her tweet, look ugly, and are easy to mistype. She runs it through TinyURL and gets back http://tinyurl.com/jlg8zpc — 25 characters, clean, clickable. Behind that one tap of the "shorten" button is a system that has to: generate a unique key nobody else has ever used, store it forever, and redirect billions of people to the right place in milliseconds. That's the system we're designing.
Beyond just being shorter, URL shorteners are useful for: tracking clicks (campaign analytics), hiding affiliated original URLs, device-optimized links, and QR codes that fit on a poster.
Before drawing a single box, pin down exactly what the system must do. In an interview, asking these questions shows you're not just regurgitating a memorized solution.
Numbers are not optional in HLD. They drive every architectural choice — sharding, caching, load balancing — so do them out loud, even if rough. Our system is read-heavy: assume a 100:1 read-to-write ratio (every URL gets created once but redirected hundreds of times).
Assume 500 million new URLs created per month. With a 100:1 read ratio, that's 50 billion redirects/month.
~200 URLs/sec
500M / (30 × 86400)
~20K redirects/sec
100 × 200/sec
~100 KB/s
200 × 500 bytes
~10 MB/s
20K × 500 bytes
500M new URLs × 12 months × 5 years = 30 billion records. At ~500 bytes per record (URL + metadata): ~15 TB total. With 70% capacity headroom: ~22 TB provisioned.
20% of URLs generate 80% of traffic. Daily read volume is 20K × 86400 = ~1.7B requests/day. To cache the hot 20% in memory: 0.2 × 1.7B × 500 bytes ≈ 170 GB cache. A modern server has 256GB RAM — this fits on one box, but we'll replicate for redundancy.
| Metric | Value | Why it matters |
|---|---|---|
| New URLs/sec | 200/s | Drives KGS throughput & write DB sizing |
| Redirects/sec | 20K/s | Drives cache size, LB tier, and read replicas |
| 5-yr storage | 15 TB | Forces sharding — too big for one box |
| Hot cache | 170 GB | Justifies a Memcached/Redis tier in front of DB |
| Egress | 10 MB/s | Standard — single LB tier handles it |
Two endpoints carry 99% of the traffic: create a short URL, and resolve a short URL (via HTTP redirect). A third for delete. Defining APIs early locks down the contract before architecture.
REST API surface// Create — write path, low QPS POST /api/v1/urls { "api_dev_key": "abc123...", "original_url": "https://www.educative.io/...long-url...", "custom_alias": "my-blog-post", // optional "user_name": "sarah", // optional "expire_date": "2027-12-31" // optional } → 201 Created { "short_url": "https://tinyurl.com/jlg8zpc" } // Resolve — read path, high QPS, served via HTTP redirect not JSON GET /:url_key → 302 Found Location: https://www.educative.io/...long-url... // 302 (not 301) so we keep getting hits for analytics // Delete DELETE /api/v1/urls/:url_key Headers: { "X-API-Dev-Key": "abc123..." } → 204 No Content
302 Found and not 301 Moved Permanently? 301 lets the browser cache the redirect target and skip our server on the next click — great for latency, terrible for click-counting analytics. 302 makes the browser re-ask us each time, so every click flows through our telemetry. Trade-off: ~10ms per redirect we wouldn't otherwise spend, in exchange for accurate metrics.api_dev_key: a malicious user could exhaust our 68 billion 6-character keys by spamming creates. We tie every create to a developer key and rate-limit per key (e.g., 1000 URLs/hour). Anonymous users go through a captcha-gated UI flow.What does our data actually look like? Four observations: (1) we'll store billions of records, (2) each record is tiny — under 1KB, (3) there are no relationships between records other than "user X created URL Y", and (4) the workload is read-heavy with extreme key-based access. This points loud and clear at a NoSQL key-value store — DynamoDB, Cassandra, or Riak. Joins are nonexistent; horizontal scaling is mandatory.
The URL table's primary key is the short hash itself (the 6-8 character alias). This is the most-queried field by orders of magnitude, so making it the PK gives us O(1) lookups. The original_url is just a payload — we never search by it.
This is the section that wins or loses the interview. We'll build the architecture in three passes: the simplest thing that could plausibly work, why it falls apart, and the production shape where every box justifies itself. Numbers from §3 drive every decision.
Sketch the simplest possible system: one app server talks to one MySQL database. To create a URL, the app generates a hash and inserts it. To redirect, the app looks up the hash and returns a 302.
Three concrete failures emerge the moment traffic shows up:
Two users submit the same long URL — they get the same short hash. Or worse, two different URLs hash to the same 6-character prefix. Each create now costs a DB read (to check collision) before the write. At 200 writes/sec each adding a round trip, write latency creeps up — and collision-retry storms can spike it 10×.
A single MySQL instance tops out around 5K-10K simple SELECTs/sec on commodity hardware. We need 20K, growing 20%/year. The DB CPU pegs at 100% and p99 latency balloons from 5ms to 500ms — every redirect on the internet now feels broken.
Five years of growth gives us 15TB of data. A single MySQL server with one disk can't hold it. Even if it could, backup/restore takes hours and a single disk failure loses every link we ever made — which is unacceptable for a service whose entire promise is "this link works forever".
The single most important insight in this design is that three workloads have wildly different shapes and must scale independently:
200 req/sec. Low volume, but every request must be durable — losing a write means a user got back a short link that points to nothing. Latency budget: 50-100ms. Needs strong consistency on the unique-key constraint.
20,000 req/sec. 100× the write volume. Latency budget: 50ms p99 — a slow redirect is a broken-feeling link. Stale reads are fine (URLs don't change). Cacheable up to the moon.
Constant background process. Manufacture unique 6-char keys offline so the write path never has to retry on collisions. Decoupled from request traffic entirely — pre-fills a pool that workers drain.
This three-way split lets us scale the read path with cheap stateless replicas, keep the write path simple, and bury the collision problem inside a dedicated Key Generation Service (KGS) that runs on its own clock.
Now the full picture. Every node is numbered — find its matching card below to see what it does and crucially what would break without it.
Use the numbers in the diagram above to find the matching card. Each one answers what is this, why is it here, and what would break without it.
Anything that hits a TinyURL — a browser following a tweet link, a mobile app deep-linking, an integration calling our API. The browser is the most common; it sends a plain HTTP GET on the short URL and expects a 302 redirect. From the client's view, the entire system is one URL.
Solves: nothing on its own — but every design choice flows backward from "what does the client experience?" Latency, availability, and the redirect protocol are all client-facing concerns.
A globally-distributed edge network that sits in front of the load balancer. For very hot URLs (think the top 1000 viral links), the CDN caches the 302 response at edge POPs around the world. A user in Tokyo following a hot link never reaches our origin servers — they get the redirect from a CDN node 20ms away.
Solves: global latency. Without a CDN, every redirect from Tokyo travels to our US-East data center and back — 200ms minimum on physics alone. A CDN turns that into 20ms for the head of the long-tail distribution, which is most of the traffic.
The traffic cop. Sits in front of both write and read app servers, distributes incoming HTTP, and yanks unhealthy backends out of rotation via health checks every few seconds. Round-robin is fine to start; switch to least-connections once we have enough nodes that load skew matters.
Solves: single-server bottleneck and single-server failure. Without an LB, one app crash takes down the whole service. With it, we lose 1/N of capacity for a few seconds until health checks fail the bad node out.
Stateless service that handles POST /api/v1/urls. The flow per request: validate input → ask KGS for an unused key → write (hash, original_url, ...) to the URL DB → return the short URL. Total budget: under 100ms. Scaling is easy because it's stateless — add pods behind the LB.
Solves: isolating the heavy lifting (key generation, DB write) behind a clean API surface. Without a dedicated write tier, write traffic would compete with the much larger read traffic for app server CPU — causing reads to slow down whenever a write spike hits.
A standalone service whose only job is to pre-generate random 6-character base64 keys and stockpile them in the Key DB. It runs on its own cadence — say batches of 10K keys every minute — and serves keys to write app servers from an in-memory queue. When a key is handed out, it's atomically moved from the "unused" pool to the "used" pool. With 68.7 billion possible keys and 200 writes/sec, the pool will outlast 5 years easily.
Solves: the core write-path problem. Without KGS, every write must do "generate hash → check DB for collision → retry if dup → insert", which is two round trips minimum and gets worse as the namespace fills up. With KGS, the write path is one round trip with zero collision risk by construction.
What if KGS dies? Single point of failure — solved by running an active-standby pair. Each write app server also caches a small batch of keys locally (~100 keys), so a 30-second KGS outage doesn't stop creates.
Two tables: unused_keys (the pool waiting to be claimed) and used_keys (already assigned). At 6 bytes per key × 68.7B keys this is ~412GB if we ever fully populate it — but in practice we only keep maybe 1M unused keys ahead of demand, so it's tiny. A simple key-value store works fine.
Solves: persisting the pool so KGS can restart without losing keys. If KGS held keys only in memory and crashed, we'd lose 10K unallocated keys per restart — acceptable, but persisting them lets us also support fancier schemes like "reserve a key for 5 minutes for a logged-in user".
Stateless service that handles GET /:url_key. The flow per request: parse hash from URL → check cache → on hit, return 302; on miss, query URL DB, populate cache, return 302 → emit a click event to telemetry. Latency budget: under 50ms. Scaled aggressively because read traffic is 100× write traffic.
Solves: serving the redirect workload at scale without touching the write path. Splitting reads from writes lets each tier scale independently — 50 read pods and 5 write pods, not 55 of each.
An in-memory key-value store holding the hottest 20% of URL mappings — about 170GB. Read flow: app server sends GET hash → cache returns the long URL in microseconds. Eviction policy: LRU — the least-recently-viewed mapping gets evicted first. Replicated for fault tolerance and to spread the load (no single cache node serves all 20K req/sec).
Solves: the 20K req/sec problem. Without a cache, every redirect hits the DB. With it, 80% of requests never touch the DB — letting the DB tier stay sized for the cold-tail 4K req/sec instead of the full 20K.
The source of truth — a partitioned, replicated NoSQL key-value store holding all 30 billion (hash → original_url) mappings. Sharded by hash via consistent hashing, with each shard replicated 3× across availability zones. Cassandra's quorum reads (R=2, W=2, N=3) give us strong-enough consistency without sacrificing availability.
Solves: durable storage that scales horizontally. A single MySQL box maxes out around 1TB of healthy operation; we need 15TB. Cassandra spreads it across dozens of nodes and handles node failures automatically.
A background worker that scans the URL DB for expired entries (every link has an expiration_date) and either soft-deletes them or fully removes them, returning the freed key to the KGS unused pool for recycling. Runs in low-traffic windows. Also lazily expires URLs on read — if a read app server fetches an expired URL, it deletes it and returns 404.
Solves: preventing the DB from growing unbounded with abandoned/expired links. Without cleanup, our 5-year storage estimate would be 5-year accumulation forever — eventually we'd be paying to store 100TB of dead links nobody clicks.
Every redirect emits a click event ({hash, timestamp, geo, referrer, user_agent}) to a Kafka topic. Downstream consumers fan out: real-time aggregator updates click counts in a Redis sorted set; batch ETL feeds a data warehouse for "show me the click history of tinyurl.com/abc". Crucially this is async — the read app server doesn't wait for telemetry before returning the 302.
Solves: click analytics without slowing down the redirect. If we tried to UPDATE click_count = click_count + 1 in the URL DB on every read, a viral link with 10K clicks/sec would melt that DB row. The async pipeline absorbs the spike and rolls up counts safely in the background.
Two real flows, mapped to the numbered components above:
tinyurl.com/api/v1/urls with her long URL.jlg8zpc from its in-memory pool and atomically marks it used in Key DB ⑥.{hash: "jlg8zpc", original_url: "https://...", user_id: 42, expiration: 2030-12-31} into URL DB ⑨ — sharded by hash, replicated 3×.{short_url: "https://tinyurl.com/jlg8zpc"}. Total elapsed: 60ms.tinyurl.com/jlg8zpc. DNS routes him to the nearest CDN ② edge in Tokyo.jlg8zpc → hit (the link is hot today) → returns 302 with the long URL. ~30ms.How do we actually produce a 6-character key? Two approaches; the PDF reference walks through both. We'll explain why production systems pick KGS.
Compute a hash (MD5 or SHA-256) of the long URL, base64-encode it, take the first 6 characters. Why 6? Base64 has 64 characters (A-Z, a-z, 0-9, -, _), so 64⁶ = 68.7 billion possible keys — comfortably more than the 30 billion we need over 5 years.
foo.com/page?id=design and foo.com/page%3Fid%3Ddesign are the same URL but hash differently — duplicate entries.Mitigations: append a sequence number or user_id to the URL before hashing (eliminates "same URL → same key"), but this still has the truncation-collision problem at write time.
Manufacture random 6-character keys offline, store them in a "unused" pool. Writes claim a key from the pool — guaranteed unique by construction, zero collision checks at request time.
POP from a Redis set, or a row-level SELECT FOR UPDATE on the unused table15TB doesn't fit on one box, and even if it did we couldn't survive its failure. We must shard (split) the URL DB across many boxes and replicate each shard 3× for fault tolerance.
"All URLs whose hash starts with A go to shard 1, B to shard 2…" Easy to implement, predictable. But: hashes aren't uniformly distributed across the alphabet, so some shards get hot. And custom aliases like my-blog-post all start with M — that shard melts.
Compute shard = consistent_hash(url_hash) % N. Distribution is uniform by construction. Use consistent hashing (not modulo) so adding/removing a shard only relocates 1/N of keys instead of all of them — no global rebalancing during scale-out.
Each shard is replicated to 3 nodes across 3 availability zones. Cassandra's R=2, W=2, N=3 gives us: writes acknowledged by 2 of 3 replicas (durable even if 1 zone dies), reads from 2 replicas with conflict resolution (consistent within seconds even if a node lags). Multi-region: extra replicas in EU and APAC for global low-latency reads, eventually consistent.
hash % N, adding a shard means N→N+1, every key's home changes, and we have a multi-day data migration with the cluster degraded throughout. Consistent hashing makes adding a shard a few-hour reshuffle of 1/(N+1) of keys.The single biggest performance lever in the system. Without a cache, every redirect hits the DB. With a cache holding the hot 20%, the DB sees only 20% of read traffic.
170GB to hold the hot 20% (calculated in §3). Modern servers ship with 256GB+ RAM, but we use 4-6 cache nodes for redundancy and to spread the 20K QPS load. Memcached or Redis both work; Redis wins if we want to also cache click counts in sorted sets.
LRU (least recently used). Hot URLs stay hot — a viral tweet's link lives in cache for days. Cold URLs naturally roll out as fresh content takes their slot. Implemented via a linked hash map / Redis LRU policy.
URLs are immutable (you can't edit where a short link points), so we only invalidate on delete/expire. The cleanup service explicitly invalidates the cache key when it expires a URL. No write-through complexity needed.
Each cache node is replicated 2× — if one dies, we lose 1/N of the hot set briefly until it gets re-fetched from DB on next read (a brief latency bump, not an outage). On startup, cache nodes don't pre-warm; they fill organically as requests come in. The first few minutes after a deploy see slightly higher DB load, then steady state.
LBs sit at three layers in our system, and each plays a slightly different role.
Public-facing LB (AWS ALB / nginx). Distributes incoming HTTPS across read and write app pods. Health-checks every 5s, evicts unhealthy pods. Terminates TLS so backend pods don't pay the crypto cost.
Client-side or sidecar LB. Uses consistent hashing on the URL hash so the same hash always goes to the same cache node — maximizing hit rate.
Cassandra/DynamoDB drivers do this themselves — the client knows the cluster topology and routes to the right shard's coordinator node.
Start with Round Robin — simple, no overhead, no state. Upgrade to Least Connections once you see one pod's CPU consistently higher than others (usually means it caught a few long-running requests and never recovered). Smart LBs also do per-pod load reporting and shift traffic away from struggling nodes proactively.
URLs have expiration_date — what happens when it passes?
Don't actively scan for expired entries — wait until a user actually requests one. On read, the app server checks the expiration; if expired, returns 404 to the user and asynchronously deletes the row. Trade-off: expired-but-unread entries linger in the DB forever, but storage is cheap and we save a giant background scan.
A lightweight Cleanup Service ⑩ runs nightly during low-traffic windows, scanning for entries where expiration_date < now() and bulk-deleting them. Returns the freed keys to the KGS unused pool for recycling.
Two-year default expiry for free-tier URLs, configurable per request, with paid-tier URLs being permanent. Storage is cheap enough that we can also choose "never expire" as the default — most production shorteners do, because a dead link breaks user trust permanently.
Every redirect emits an async event. Naïvely doing UPDATE click_count = click_count + 1 on the URL row would melt that row for any viral link (10K writes/sec on one row = lock contention hell). Instead:
Counts that we care about (this URL got 1.2M clicks) live in Redis sorted sets — atomic ZINCRBY handles 100K writes/sec on one node. Detailed per-event data (geo, referrer, user_agent) flows to a data warehouse for ad-hoc analytics queries.
Add a visibility column: PUBLIC (anyone can resolve) or PRIVATE (only the creator + whitelisted users). On read, the app server checks the auth token against an ACL table; on mismatch, return 401 instead of 302.
Submitted URLs go through a real-time Google Safe Browsing check before being shortened. Rejected URLs return 400. Reactive: a "report abuse" link on every redirect page; reported URLs go into a moderation queue.
Per-API-key quota — 1000 URLs/hour for free tier, higher for paid. Implemented at the LB or in a dedicated rate-limiter service (token bucket in Redis). Anonymous creates require captcha.
KGS-generated keys are random 6-character base64, not sequential. An attacker can't enumerate /aaaaaa, /aaaaab, ... to discover private URLs — the keyspace is 68.7B, sparsely populated, so brute force is infeasible.