← Back to Design & Development
High-Level Design

TinyURL / Short URL Service

From "redirect 20K times a second" to a sharded, cached, KGS-backed system — the architecture, the trade-offs, and the production shape that earns every box

Read this with the framework in mind

This deep-dive applies the 4-step HLD interview framework. As you read, map each section to Requirements → Entities → APIs → High-Level Design → Deep Dives, and notice which of the 8 common patterns and key technologies are at play.

Framework → 8 Patterns → Tech Cheat Sheet →
Step 1

Why URL Shortening?

Imagine Sarah is tweeting a link to a long blog article. The original URL is https://www.educative.io/collection/page/5668639101419520/5649050225344512/5668600916475904/ — 100+ characters that eat up half of her tweet, look ugly, and are easy to mistype. She runs it through TinyURL and gets back http://tinyurl.com/jlg8zpc — 25 characters, clean, clickable. Behind that one tap of the "shorten" button is a system that has to: generate a unique key nobody else has ever used, store it forever, and redirect billions of people to the right place in milliseconds. That's the system we're designing.

Beyond just being shorter, URL shorteners are useful for: tracking clicks (campaign analytics), hiding affiliated original URLs, device-optimized links, and QR codes that fit on a poster.

The two questions that drive every design decision below: (1) How do we generate a short, unique key for each URL without collisions? (2) How do we serve 20,000 redirects per second with under-100ms latency, when most of those redirects are for the same handful of viral links?
Step 2

Requirements & Goals

Before drawing a single box, pin down exactly what the system must do. In an interview, asking these questions shows you're not just regurgitating a memorized solution.

✅ Functional Requirements

  • Given a long URL, generate a shorter unique alias (the "short link")
  • When users hit a short link, redirect them to the original URL
  • Users can optionally pick a custom alias for their URL
  • Links expire after a default timespan; users can override the expiry

⚙️ Non-Functional Requirements

  • Highly available — if redirection is down, every link on the internet pointing at us is broken
  • Real-time redirection — under 100ms latency p99
  • Unguessable short keys — should not be predictable (for privacy & abuse)

➕ Extended

  • Analytics — click counts, geo, referrer, browser
  • REST APIs for other services to integrate
The non-functional requirements are the harder ones. Generating a unique key is a 30-line problem. Serving 20K reads/sec with global low latency, and never losing a single mapping over 5 years, is the part that actually requires architecture.
Step 3

Capacity Estimation & Constraints

Numbers are not optional in HLD. They drive every architectural choice — sharding, caching, load balancing — so do them out loud, even if rough. Our system is read-heavy: assume a 100:1 read-to-write ratio (every URL gets created once but redirected hundreds of times).

Traffic estimates

Assume 500 million new URLs created per month. With a 100:1 read ratio, that's 50 billion redirects/month.

Writes

~200 URLs/sec

500M / (30 × 86400)

Reads

~20K redirects/sec

100 × 200/sec

Ingress

~100 KB/s

200 × 500 bytes

Egress

~10 MB/s

20K × 500 bytes

Storage estimate (5 years)

500M new URLs × 12 months × 5 years = 30 billion records. At ~500 bytes per record (URL + metadata): ~15 TB total. With 70% capacity headroom: ~22 TB provisioned.

Cache estimate (the 80/20 rule)

20% of URLs generate 80% of traffic. Daily read volume is 20K × 86400 = ~1.7B requests/day. To cache the hot 20% in memory: 0.2 × 1.7B × 500 bytes ≈ 170 GB cache. A modern server has 256GB RAM — this fits on one box, but we'll replicate for redundancy.

MetricValueWhy it matters
New URLs/sec200/sDrives KGS throughput & write DB sizing
Redirects/sec20K/sDrives cache size, LB tier, and read replicas
5-yr storage15 TBForces sharding — too big for one box
Hot cache170 GBJustifies a Memcached/Redis tier in front of DB
Egress10 MB/sStandard — single LB tier handles it
Step 4

System APIs

Two endpoints carry 99% of the traffic: create a short URL, and resolve a short URL (via HTTP redirect). A third for delete. Defining APIs early locks down the contract before architecture.

REST API surface
// Create — write path, low QPS
POST /api/v1/urls
{
  "api_dev_key":  "abc123...",
  "original_url": "https://www.educative.io/...long-url...",
  "custom_alias": "my-blog-post",   // optional
  "user_name":    "sarah",          // optional
  "expire_date":  "2027-12-31"      // optional
}
→ 201 Created  { "short_url": "https://tinyurl.com/jlg8zpc" }

// Resolve — read path, high QPS, served via HTTP redirect not JSON
GET /:url_key
→ 302 Found  Location: https://www.educative.io/...long-url...
   // 302 (not 301) so we keep getting hits for analytics

// Delete
DELETE /api/v1/urls/:url_key
Headers: { "X-API-Dev-Key": "abc123..." }
→ 204 No Content
Why 302 Found and not 301 Moved Permanently? 301 lets the browser cache the redirect target and skip our server on the next click — great for latency, terrible for click-counting analytics. 302 makes the browser re-ask us each time, so every click flows through our telemetry. Trade-off: ~10ms per redirect we wouldn't otherwise spend, in exchange for accurate metrics.
Abuse prevention via api_dev_key: a malicious user could exhaust our 68 billion 6-character keys by spamming creates. We tie every create to a developer key and rate-limit per key (e.g., 1000 URLs/hour). Anonymous users go through a captcha-gated UI flow.
Step 5

Database Design

What does our data actually look like? Four observations: (1) we'll store billions of records, (2) each record is tiny — under 1KB, (3) there are no relationships between records other than "user X created URL Y", and (4) the workload is read-heavy with extreme key-based access. This points loud and clear at a NoSQL key-value store — DynamoDB, Cassandra, or Riak. Joins are nonexistent; horizontal scaling is mandatory.

erDiagram URL { string hash PK string original_url timestamp creation_date timestamp expiration_date bigint user_id FK } USER { bigint id PK string name string email timestamp creation_date timestamp last_login } USER ||--o{ URL : "creates"

The URL table's primary key is the short hash itself (the 6-8 character alias). This is the most-queried field by orders of magnitude, so making it the PK gives us O(1) lookups. The original_url is just a payload — we never search by it.

Why NoSQL beats SQL here: our access pattern is "give me the URL for hash X" — the simplest possible key-value query, repeated 20K times/sec. SQL's joins, transactions, and schema rigidity buy us nothing; their cost (vertical scaling, harder sharding) hurts. A wide-column NoSQL store like Cassandra also gives us free multi-region replication, which we'll need for global low-latency redirects.
Step 6 · CORE

High-Level Architecture — From Naive to Production

This is the section that wins or loses the interview. We'll build the architecture in three passes: the simplest thing that could plausibly work, why it falls apart, and the production shape where every box justifies itself. Numbers from §3 drive every decision.

Pass 1 — The naive design (and why it breaks)

Sketch the simplest possible system: one app server talks to one MySQL database. To create a URL, the app generates a hash and inserts it. To redirect, the app looks up the hash and returns a 302.

flowchart LR C[Client] --> APP[App Server] APP --> DB[("MySQL")]

Three concrete failures emerge the moment traffic shows up:

💥 Hash collisions at write time

Two users submit the same long URL — they get the same short hash. Or worse, two different URLs hash to the same 6-character prefix. Each create now costs a DB read (to check collision) before the write. At 200 writes/sec each adding a round trip, write latency creeps up — and collision-retry storms can spike it 10×.

💥 20K reads/sec crushes one DB

A single MySQL instance tops out around 5K-10K simple SELECTs/sec on commodity hardware. We need 20K, growing 20%/year. The DB CPU pegs at 100% and p99 latency balloons from 5ms to 500ms — every redirect on the internet now feels broken.

💥 15TB doesn't fit on one box

Five years of growth gives us 15TB of data. A single MySQL server with one disk can't hold it. Even if it could, backup/restore takes hours and a single disk failure loses every link we ever made — which is unacceptable for a service whose entire promise is "this link works forever".

Pass 2 — The mental model: Write Path vs. Read Path vs. Key Generation

The single most important insight in this design is that three workloads have wildly different shapes and must scale independently:

✍️ Write Path

200 req/sec. Low volume, but every request must be durable — losing a write means a user got back a short link that points to nothing. Latency budget: 50-100ms. Needs strong consistency on the unique-key constraint.

📖 Read Path

20,000 req/sec. 100× the write volume. Latency budget: 50ms p99 — a slow redirect is a broken-feeling link. Stale reads are fine (URLs don't change). Cacheable up to the moon.

🔑 Key Generation

Constant background process. Manufacture unique 6-char keys offline so the write path never has to retry on collisions. Decoupled from request traffic entirely — pre-fills a pool that workers drain.

This three-way split lets us scale the read path with cheap stateless replicas, keep the write path simple, and bury the collision problem inside a dedicated Key Generation Service (KGS) that runs on its own clock.

Pass 3 — The production shape

Now the full picture. Every node is numbered — find its matching card below to see what it does and crucially what would break without it.

flowchart TB CL["① Client — Browser, App, SDK"] subgraph EDGE["Edge Tier"] CDN["② CDN — CloudFront / Akamai"] LB["③ Load Balancer"] end subgraph WRITE["Write Path"] WAPP["④ Write App Server"] KGS["⑤ Key Generation Service"] KDB[("⑥ Key DB — unused + used pools")] end subgraph READ["Read Path"] RAPP["⑦ Read App Server"] CACHE["⑧ Cache — Memcached / Redis"] end subgraph DATA["Data Tier"] DB[("⑨ URL DB — Cassandra / DynamoDB")] end subgraph OPS["Async / Ops"] CLEAN["⑩ Cleanup Service"] TEL["⑪ Telemetry Pipeline"] end CL --> CDN CDN --> LB LB -->|"POST create"| WAPP LB -->|"GET resolve"| RAPP WAPP --> KGS KGS --> KDB WAPP --> DB RAPP --> CACHE CACHE -.miss.-> DB RAPP -.events.-> TEL CLEAN -.scans.-> DB CLEAN -.recycles keys.-> KDB style CL fill:#e8743b,stroke:#e8743b,color:#fff style CDN fill:#171d27,stroke:#3cbfbf,color:#d4dae5 style LB fill:#171d27,stroke:#9b72cf,color:#d4dae5 style WAPP fill:#171d27,stroke:#e8743b,color:#d4dae5 style KGS fill:#171d27,stroke:#9b72cf,color:#d4dae5 style KDB fill:#171d27,stroke:#9b72cf,color:#d4dae5 style RAPP fill:#171d27,stroke:#4a90d9,color:#d4dae5 style CACHE fill:#171d27,stroke:#3cbfbf,color:#d4dae5 style DB fill:#171d27,stroke:#38b265,color:#d4dae5 style CLEAN fill:#171d27,stroke:#d4a838,color:#d4dae5 style TEL fill:#171d27,stroke:#d4a838,color:#d4dae5

Component-by-component — what each numbered box does

Use the numbers in the diagram above to find the matching card. Each one answers what is this, why is it here, and what would break without it.

Client

Anything that hits a TinyURL — a browser following a tweet link, a mobile app deep-linking, an integration calling our API. The browser is the most common; it sends a plain HTTP GET on the short URL and expects a 302 redirect. From the client's view, the entire system is one URL.

Solves: nothing on its own — but every design choice flows backward from "what does the client experience?" Latency, availability, and the redirect protocol are all client-facing concerns.

CDN (CloudFront / Akamai)

A globally-distributed edge network that sits in front of the load balancer. For very hot URLs (think the top 1000 viral links), the CDN caches the 302 response at edge POPs around the world. A user in Tokyo following a hot link never reaches our origin servers — they get the redirect from a CDN node 20ms away.

Solves: global latency. Without a CDN, every redirect from Tokyo travels to our US-East data center and back — 200ms minimum on physics alone. A CDN turns that into 20ms for the head of the long-tail distribution, which is most of the traffic.

Load Balancer

The traffic cop. Sits in front of both write and read app servers, distributes incoming HTTP, and yanks unhealthy backends out of rotation via health checks every few seconds. Round-robin is fine to start; switch to least-connections once we have enough nodes that load skew matters.

Solves: single-server bottleneck and single-server failure. Without an LB, one app crash takes down the whole service. With it, we lose 1/N of capacity for a few seconds until health checks fail the bad node out.

Write App Server

Stateless service that handles POST /api/v1/urls. The flow per request: validate input → ask KGS for an unused key → write (hash, original_url, ...) to the URL DB → return the short URL. Total budget: under 100ms. Scaling is easy because it's stateless — add pods behind the LB.

Solves: isolating the heavy lifting (key generation, DB write) behind a clean API surface. Without a dedicated write tier, write traffic would compete with the much larger read traffic for app server CPU — causing reads to slow down whenever a write spike hits.

Key Generation Service (KGS)

A standalone service whose only job is to pre-generate random 6-character base64 keys and stockpile them in the Key DB. It runs on its own cadence — say batches of 10K keys every minute — and serves keys to write app servers from an in-memory queue. When a key is handed out, it's atomically moved from the "unused" pool to the "used" pool. With 68.7 billion possible keys and 200 writes/sec, the pool will outlast 5 years easily.

Solves: the core write-path problem. Without KGS, every write must do "generate hash → check DB for collision → retry if dup → insert", which is two round trips minimum and gets worse as the namespace fills up. With KGS, the write path is one round trip with zero collision risk by construction.

What if KGS dies? Single point of failure — solved by running an active-standby pair. Each write app server also caches a small batch of keys locally (~100 keys), so a 30-second KGS outage doesn't stop creates.

Key DB (KGS storage)

Two tables: unused_keys (the pool waiting to be claimed) and used_keys (already assigned). At 6 bytes per key × 68.7B keys this is ~412GB if we ever fully populate it — but in practice we only keep maybe 1M unused keys ahead of demand, so it's tiny. A simple key-value store works fine.

Solves: persisting the pool so KGS can restart without losing keys. If KGS held keys only in memory and crashed, we'd lose 10K unallocated keys per restart — acceptable, but persisting them lets us also support fancier schemes like "reserve a key for 5 minutes for a logged-in user".

Read App Server

Stateless service that handles GET /:url_key. The flow per request: parse hash from URL → check cache → on hit, return 302; on miss, query URL DB, populate cache, return 302 → emit a click event to telemetry. Latency budget: under 50ms. Scaled aggressively because read traffic is 100× write traffic.

Solves: serving the redirect workload at scale without touching the write path. Splitting reads from writes lets each tier scale independently — 50 read pods and 5 write pods, not 55 of each.

Cache (Memcached / Redis)

An in-memory key-value store holding the hottest 20% of URL mappings — about 170GB. Read flow: app server sends GET hash → cache returns the long URL in microseconds. Eviction policy: LRU — the least-recently-viewed mapping gets evicted first. Replicated for fault tolerance and to spread the load (no single cache node serves all 20K req/sec).

Solves: the 20K req/sec problem. Without a cache, every redirect hits the DB. With it, 80% of requests never touch the DB — letting the DB tier stay sized for the cold-tail 4K req/sec instead of the full 20K.

URL DB (Cassandra / DynamoDB)

The source of truth — a partitioned, replicated NoSQL key-value store holding all 30 billion (hash → original_url) mappings. Sharded by hash via consistent hashing, with each shard replicated 3× across availability zones. Cassandra's quorum reads (R=2, W=2, N=3) give us strong-enough consistency without sacrificing availability.

Solves: durable storage that scales horizontally. A single MySQL box maxes out around 1TB of healthy operation; we need 15TB. Cassandra spreads it across dozens of nodes and handles node failures automatically.

Cleanup Service

A background worker that scans the URL DB for expired entries (every link has an expiration_date) and either soft-deletes them or fully removes them, returning the freed key to the KGS unused pool for recycling. Runs in low-traffic windows. Also lazily expires URLs on read — if a read app server fetches an expired URL, it deletes it and returns 404.

Solves: preventing the DB from growing unbounded with abandoned/expired links. Without cleanup, our 5-year storage estimate would be 5-year accumulation forever — eventually we'd be paying to store 100TB of dead links nobody clicks.

Telemetry Pipeline

Every redirect emits a click event ({hash, timestamp, geo, referrer, user_agent}) to a Kafka topic. Downstream consumers fan out: real-time aggregator updates click counts in a Redis sorted set; batch ETL feeds a data warehouse for "show me the click history of tinyurl.com/abc". Crucially this is async — the read app server doesn't wait for telemetry before returning the 302.

Solves: click analytics without slowing down the redirect. If we tried to UPDATE click_count = click_count + 1 in the URL DB on every read, a viral link with 10K clicks/sec would melt that DB row. The async pipeline absorbs the spike and rolls up counts safely in the background.

Concrete walkthrough — Sarah shortens, Raj clicks

Two real flows, mapped to the numbered components above:

✍️ Write flow — Sarah shortens her blog link at 14:02:06

  1. Sarah's browser ① POSTs to tinyurl.com/api/v1/urls with her long URL.
  2. The Load Balancer ③ routes it to a Write App Server ④. (CDN ② doesn't cache write requests.)
  3. The write app calls KGS ⑤: "give me a key" → KGS pops jlg8zpc from its in-memory pool and atomically marks it used in Key DB ⑥.
  4. Write app inserts {hash: "jlg8zpc", original_url: "https://...", user_id: 42, expiration: 2030-12-31} into URL DB ⑨ — sharded by hash, replicated 3×.
  5. Write app returns {short_url: "https://tinyurl.com/jlg8zpc"}. Total elapsed: 60ms.

📖 Read flow — Raj in Tokyo clicks the link 8 hours later

  1. Raj's browser ① GETs tinyurl.com/jlg8zpc. DNS routes him to the nearest CDN ② edge in Tokyo.
  2. CDN cache hit? If yes → 302 returned in 15ms. Done.
  3. If no → CDN forwards to our origin Load Balancer ③ → Read App Server ⑦.
  4. Read app checks Cache ⑧ for jlg8zpc → hit (the link is hot today) → returns 302 with the long URL. ~30ms.
  5. If cache miss → read app queries URL DB ⑨, populates the cache, returns 302. ~80ms.
  6. Read app fires off a click event to Telemetry ⑪ asynchronously. CDN may also cache this 302 for next time.
So what: the architecture is built around three insights — (1) writes and reads have different shapes so they get different tiers; (2) collisions are a write-path problem solved by pre-generation in KGS, not by retries; (3) the read path is mostly cacheable, so the DB is sized for the cold tail not the hot peak. Every box in the diagram earns its place by removing one of those failure modes from Pass 1.
Step 7

Encoding the URL vs. Key Generation Service

How do we actually produce a 6-character key? Two approaches; the PDF reference walks through both. We'll explain why production systems pick KGS.

Approach A — Hash & encode the URL

Compute a hash (MD5 or SHA-256) of the long URL, base64-encode it, take the first 6 characters. Why 6? Base64 has 64 characters (A-Z, a-z, 0-9, -, _), so 64⁶ = 68.7 billion possible keys — comfortably more than the 30 billion we need over 5 years.

⚠️ Three problems with this approach

  • Same URL → same key. Two users shortening the same URL get the same short link. Acceptable? Sometimes, but you lose per-user analytics and can't expire one without expiring the other.
  • URL-encoding collisions. foo.com/page?id=design and foo.com/page%3Fid%3Ddesign are the same URL but hash differently — duplicate entries.
  • Truncation collisions. Two different URLs may hash to different full SHA-256s but identical first-6-character prefixes. Now you must do "compute → check DB for collision → if dup, take the next 6 chars or rehash with salt → check again". Each write is now 1-3 DB round-trips, and the collision rate gets worse as the namespace fills.

Mitigations: append a sequence number or user_id to the URL before hashing (eliminates "same URL → same key"), but this still has the truncation-collision problem at write time.

Approach B — Key Generation Service (KGS) ← production winner

Manufacture random 6-character keys offline, store them in a "unused" pool. Writes claim a key from the pool — guaranteed unique by construction, zero collision checks at request time.

sequenceDiagram participant W as Write App participant K as KGS participant KDB as Key DB participant U as URL DB Note over K,KDB: KGS pre-generates batches of keys in background K->>KDB: Generate 10K random keys, INSERT into unused KDB-->>K: OK W->>K: Give me a key K->>KDB: Atomically move 1 key from unused to used KDB-->>K: jlg8zpc K-->>W: jlg8zpc W->>U: INSERT (jlg8zpc, original_url, ...) U-->>W: OK

✅ Why KGS wins

  • Zero collision risk at write time — uniqueness is guaranteed by the unused pool
  • One round-trip per write (vs. 1-3 with hashing)
  • Same long URL submitted twice → two different short keys → independent analytics & expiry
  • Can pre-warm the pool, app servers cache 100 keys locally for burst tolerance
  • Recycle expired keys back into the pool — namespace never permanently fills

⚠️ KGS's own challenges

  • Single point of failure — solved with an active-standby replica that takes over via failover
  • Concurrency — two write apps must not get the same key. Solved by an atomic POP from a Redis set, or a row-level SELECT FOR UPDATE on the unused table
  • Lost keys on crash — if KGS hands a key to a write app that then crashes before INSERT, the key is gone. Acceptable: 68.7B keys, losing a few thousand to crashes is rounding error
The interview move: volunteer Approach A first ("simplest thing that could work"), then walk through its three failure modes, then introduce KGS as the production answer. This shows you understand why KGS exists, not just that it exists.
Step 8

Data Partitioning & Replication

15TB doesn't fit on one box, and even if it did we couldn't survive its failure. We must shard (split) the URL DB across many boxes and replicate each shard 3× for fault tolerance.

❌ Range-based sharding by first letter

"All URLs whose hash starts with A go to shard 1, B to shard 2…" Easy to implement, predictable. But: hashes aren't uniformly distributed across the alphabet, so some shards get hot. And custom aliases like my-blog-post all start with M — that shard melts.

✅ Hash-based sharding (with consistent hashing)

Compute shard = consistent_hash(url_hash) % N. Distribution is uniform by construction. Use consistent hashing (not modulo) so adding/removing a shard only relocates 1/N of keys instead of all of them — no global rebalancing during scale-out.

Replication strategy

Each shard is replicated to 3 nodes across 3 availability zones. Cassandra's R=2, W=2, N=3 gives us: writes acknowledged by 2 of 3 replicas (durable even if 1 zone dies), reads from 2 replicas with conflict resolution (consistent within seconds even if a node lags). Multi-region: extra replicas in EU and APAC for global low-latency reads, eventually consistent.

Why consistent hashing matters specifically here: our shard count will grow over time (5 shards today, 50 in 5 years). With plain hash % N, adding a shard means N→N+1, every key's home changes, and we have a multi-day data migration with the cluster degraded throughout. Consistent hashing makes adding a shard a few-hour reshuffle of 1/(N+1) of keys.
Step 9

Cache

The single biggest performance lever in the system. Without a cache, every redirect hits the DB. With a cache holding the hot 20%, the DB sees only 20% of read traffic.

📦 Sizing

170GB to hold the hot 20% (calculated in §3). Modern servers ship with 256GB+ RAM, but we use 4-6 cache nodes for redundancy and to spread the 20K QPS load. Memcached or Redis both work; Redis wins if we want to also cache click counts in sorted sets.

♻️ Eviction

LRU (least recently used). Hot URLs stay hot — a viral tweet's link lives in cache for days. Cold URLs naturally roll out as fresh content takes their slot. Implemented via a linked hash map / Redis LRU policy.

🔄 Invalidation

URLs are immutable (you can't edit where a short link points), so we only invalidate on delete/expire. The cleanup service explicitly invalidates the cache key when it expires a URL. No write-through complexity needed.

Cache replication & warm-up

Each cache node is replicated 2× — if one dies, we lose 1/N of the hot set briefly until it gets re-fetched from DB on next read (a brief latency bump, not an outage). On startup, cache nodes don't pre-warm; they fill organically as requests come in. The first few minutes after a deploy see slightly higher DB load, then steady state.

Step 10

Load Balancer

LBs sit at three layers in our system, and each plays a slightly different role.

① Client → App tier

Public-facing LB (AWS ALB / nginx). Distributes incoming HTTPS across read and write app pods. Health-checks every 5s, evicts unhealthy pods. Terminates TLS so backend pods don't pay the crypto cost.

② App → Cache

Client-side or sidecar LB. Uses consistent hashing on the URL hash so the same hash always goes to the same cache node — maximizing hit rate.

③ App → DB

Cassandra/DynamoDB drivers do this themselves — the client knows the cluster topology and routes to the right shard's coordinator node.

Algorithm choice

Start with Round Robin — simple, no overhead, no state. Upgrade to Least Connections once you see one pod's CPU consistently higher than others (usually means it caught a few long-running requests and never recovered). Smart LBs also do per-pod load reporting and shift traffic away from struggling nodes proactively.

The LB is itself a single point of failure — solved with an active-passive pair using virtual IP failover, or natively in cloud LBs (ALB is multi-AZ by default).
Step 11

Purging & DB Cleanup

URLs have expiration_date — what happens when it passes?

🐢 Lazy cleanup (preferred)

Don't actively scan for expired entries — wait until a user actually requests one. On read, the app server checks the expiration; if expired, returns 404 to the user and asynchronously deletes the row. Trade-off: expired-but-unread entries linger in the DB forever, but storage is cheap and we save a giant background scan.

🧹 Active cleanup (supplement)

A lightweight Cleanup Service ⑩ runs nightly during low-traffic windows, scanning for entries where expiration_date < now() and bulk-deleting them. Returns the freed keys to the KGS unused pool for recycling.

Default expiry policy

Two-year default expiry for free-tier URLs, configurable per request, with paid-tier URLs being permanent. Storage is cheap enough that we can also choose "never expire" as the default — most production shorteners do, because a dead link breaks user trust permanently.

Step 12

Telemetry & Security

Telemetry — answering "how many clicks did this link get?"

Every redirect emits an async event. Naïvely doing UPDATE click_count = click_count + 1 on the URL row would melt that row for any viral link (10K writes/sec on one row = lock contention hell). Instead:

flowchart LR RAPP[Read App Server] -.click event.-> KAFKA[Kafka topic] KAFKA --> AGG[Real-time aggregator] KAFKA --> ETL[Batch ETL] AGG --> RDS[("Redis sorted set
per-URL counts")] ETL --> WH[("Data warehouse
geo, browser, history")] style KAFKA fill:#171d27,stroke:#d4a838,color:#d4dae5 style RDS fill:#171d27,stroke:#e05252,color:#d4dae5 style WH fill:#171d27,stroke:#9b72cf,color:#d4dae5

Counts that we care about (this URL got 1.2M clicks) live in Redis sorted sets — atomic ZINCRBY handles 100K writes/sec on one node. Detailed per-event data (geo, referrer, user_agent) flows to a data warehouse for ad-hoc analytics queries.

Security & Permissions

🔐 Private URLs

Add a visibility column: PUBLIC (anyone can resolve) or PRIVATE (only the creator + whitelisted users). On read, the app server checks the auth token against an ACL table; on mismatch, return 401 instead of 302.

🛡️ Abuse / phishing

Submitted URLs go through a real-time Google Safe Browsing check before being shortened. Rejected URLs return 400. Reactive: a "report abuse" link on every redirect page; reported URLs go into a moderation queue.

🚫 Rate limiting

Per-API-key quota — 1000 URLs/hour for free tier, higher for paid. Implemented at the LB or in a dedicated rate-limiter service (token bucket in Redis). Anonymous creates require captcha.

🔒 Unguessable keys

KGS-generated keys are random 6-character base64, not sequential. An attacker can't enumerate /aaaaaa, /aaaaab, ... to discover private URLs — the keyspace is 68.7B, sparsely populated, so brute force is infeasible.

Step 13

Interview Q&A

Why 302 redirects instead of 301?
Analytics. 301 lets the browser cache the redirect target permanently — every subsequent click bypasses our server, killing click counts. 302 forces the browser to ask us each time. Trade-off: ~10ms extra latency per click for accurate metrics. Bitly, TinyURL, and most shorteners pick 302.
Why NoSQL over SQL for the URL store?
Access pattern is pure key-value at huge scale. Our only query is "give me the URL for hash X" — no joins, no transactions, no complex filters. SQL's strengths (joins, ACID) buy us nothing; its weaknesses (vertical scaling ceiling, hard sharding) hurt. NoSQL like Cassandra/DynamoDB gives us automatic sharding, multi-region replication, and predictable single-digit-ms reads at any scale.
What if KGS goes down?
Three layers of defense. (1) Active-standby KGS pair with failover in seconds. (2) Each write app server caches ~100 unallocated keys locally — a 30-second KGS outage means writes still succeed from local cache. (3) If both fail, write apps degrade to inline hashing as fallback (slower, occasional collisions, but doesn't drop writes).
How do you handle a viral link getting 100K req/sec?
The cache + CDN absorb it entirely. A viral link is the definition of "hot" — it's in every CDN edge POP and every Memcached node. Origin DB sees zero traffic from it. The only choke point is bandwidth out of CDN, which is what CDNs are explicitly built to handle. Click telemetry is async via Kafka, which buffers the spike without dropping events.
How do you prevent collisions in the KGS unused pool?
KGS generates random keys (not sequential), then INSERTs into the unused table with a UNIQUE constraint. Duplicates fail and are skipped. With 68.7B keyspace and only 30B keys ever issued over 5 years, the birthday-paradox collision rate during generation is negligible (under 1 collision per 100K generated). Pool prefill makes this completely invisible to the write path.
How do you scale globally for low-latency reads?
Two layers. (1) CDN caches the 302 response at edge POPs in 200+ cities — a Tokyo user hits a Tokyo edge node, not US-East. (2) Multi-region DB replicas for cache misses — Cassandra/DynamoDB Global Tables replicate writes to EU and APAC within seconds. Read app servers in each region hit their local replica. Trade-off: writes are still single-region with global replication lag (~1-5s), so a URL just created in US-East might 404 in Tokyo for a few seconds. Acceptable for our use case.
What's the storage cost trade-off of "never expire" vs. "2-year default"?
15TB at 5 years vs. effectively unbounded growth. At AWS S3-class pricing (~$0.023/GB/mo for cold tier), 15TB costs ~$345/month — trivial. The reputational cost of expiring a link someone bookmarked 3 years ago is much higher. Most production shorteners default to permanent and only expire on user request or for spam.
How do you migrate from MySQL to Cassandra without downtime?
Dual-write pattern. Phase 1: read app servers read from MySQL only, write apps write to BOTH MySQL and Cassandra. Phase 2: backfill historical MySQL data into Cassandra via an ETL job. Phase 3: shadow-read from Cassandra and compare with MySQL for a week to verify correctness. Phase 4: flip read traffic to Cassandra, keep MySQL as fallback for a month. Phase 5: decommission MySQL. Total: ~6 weeks of planned migration with zero user-visible downtime.
The one-line summary the interviewer remembers: "It's a NoSQL key-value store partitioned with consistent hashing, fronted by a Memcached cache and a CDN, with a separate Key Generation Service that pre-generates unique 6-char keys offline so the write path never collides — three independent tiers each scaled to its own workload."