From "drop a stack trace into a database BLOB" to a two-plane system that keeps small metadata fast and large blobs cheap — the architecture, the trade-offs, and the production shape that earns every box
This deep-dive applies the 4-step HLD interview framework. As you read, map each section to Requirements → Entities → APIs → High-Level Design → Deep Dives, and notice which of the 8 common patterns and key technologies are at play.
It's 14:02. Raj is debugging a flaky deploy on his laptop and a 50KB Java stack trace explodes across his terminal. He needs to send it to Sarah on the SRE team. He could paste it into Slack — but Slack truncates after a few hundred lines and the indentation gets murdered. Instead Raj opens pastebin.com, pastes the trace into a text box, hits "Create" and gets back https://pastebin.com/aZx9k2. He drops the URL in chat. Sarah clicks it three minutes later and sees the same stack trace, formatted exactly as Raj pasted it. That round-trip is what we're designing.
Pastebin (and clones like Hastebin, GitHub Gist, Pastie) lets users save a chunk of plain text — code, config, log output, JSON dumps, error messages — and get back a unique, shareable URL. The text lives at that URL until it expires or is deleted. The defining trait: the URL returns the text content itself, not a redirect to somewhere else. That single difference, more than anything, drives the architecture away from a URL-shortener shape and into a two-plane storage system.
Before drawing a single box, pin down what the system must do. In an interview, asking these questions out loud signals you know the difference between a 30-line CRUD app and a real distributed system.
Numbers drive every architectural choice. Pastebin is read-heavy but only modestly so: assume a 5:1 read-to-write ratio. People paste once and a small group of teammates each click the link a few times. (Compare to a URL shortener at 100:1, where one viral link gets clicked millions of times — Pastebin links are usually shared with 2-10 specific people.)
Assume 1 million new pastes per day. With a 5:1 read ratio, that's 5 million reads/day.
~12 pastes/sec
1M / 86400
~58 reads/sec
5M / 86400
~120 KB/s
12 × 10KB avg
~0.6 MB/s
58 × 10KB avg
Average paste size: 10 KB (a typical stack trace, a few hundred lines of YAML, a short snippet). Per day: 1M × 10KB = ~10 GB/day. Over 10 years: 10 GB × 365 × 10 = ~36 TB. With 70% capacity headroom: ~51.4 TB provisioned.
Roughly 20% of pastes draw 80% of reads — they're the ones still being shared in active threads. Daily read volume is 58 × 86400 = ~5M requests/day. To cache the hot 20% in memory: 0.2 × 5M × 10KB ≈ ~10 GB cache. Comfortably fits on a single Memcached/Redis box; we'll still replicate for fault tolerance.
| Metric | Value | Why it matters |
|---|---|---|
| New pastes/sec | 12/s | Drives KGS throughput & metadata write rate |
| Reads/sec | 58/s | Drives blob cache size and read replicas |
| 10-yr storage | 36 TB | Forces blob-store split — too big for one DB |
| Hot cache | 10 GB | Justifies a Memcached/Redis tier in front of S3 |
| Egress | 0.6 MB/s | Trivial — single LB tier handles it easily |
| Max paste size | 10 MB | Hard cap to prevent abuse and streaming complexity |
Three endpoints — create, fetch, delete. Defining APIs early locks down the contract before architecture. Note the key difference from a URL shortener: GET returns text content directly, not a 302 redirect.
REST API surface// Create — write path, low QPS, payload up to 10MB POST /api/v1/pastes { "api_dev_key": "abc123...", "paste_data": "Exception in thread \"main\" java.lang.NullPointerException...", "paste_name": "deploy-failure-2026-05-07", // optional "custom_url": "deploy-bug", // optional "expire_date": "2026-05-14T00:00:00Z" // optional } → 201 Created { "paste_url": "https://pastebin.com/aZx9k2" } // Fetch — read path, returns raw text content (not a redirect) GET /:paste_key → 200 OK Content-Type: text/plain; charset=utf-8 Body: <the original paste content> // Delete — owner only DELETE /:paste_key Headers: { "X-API-Dev-Key": "abc123..." } → 204 No Content
api_dev_key: a malicious script could spam a million 10MB pastes and burn through our storage budget in a single afternoon. Every create is tied to a developer key that's rate-limited (e.g., 100 pastes/hour per free key, 10MB cap each). Anonymous users go through a captcha-gated UI flow and a per-IP quota.Look at the data and ask: what shape is it? Two very different shapes hide inside one paste. The metadata — paste_id, owner, expiry, creation time — is small (tens of bytes), structured, queryable, and tightly indexed. The content — the actual pasted text — is large (KB to MB), unstructured, and only ever fetched whole by primary key. Forcing both shapes into the same store is the central mistake we'll fix in the next section. Here we just acknowledge the split in the schema.
The PASTE row is small (~80 bytes) and lives in a relational or wide-column metadata DB (MySQL/Cassandra). The content_key column is a pointer into object storage (Amazon S3 or equivalent), where the actual paste text lives as a single object. When Sarah opens aZx9k2, the app fetches the metadata row by url_hash, reads the content_key, then fetches the blob from S3.
Holds the small, queryable rows: who owns this paste, when does it expire, what's its content_key, is it public or private. Workload: indexed lookups by url_hash, occasional range scans for cleanup. Either MySQL (sharded) or Cassandra works — Cassandra wins at the multi-million-row scale because of automatic sharding.
Holds the large, opaque content blobs keyed by content_key. Workload: single-key GET / PUT, never queried by content. S3 is purpose-built for this — 11 nines of durability across multiple zones, costs roughly $0.023/GB/month for Standard, single-digit-ms latency for objects under a few MB.
SELECT * into a memory hog. Worse, you'd be paying premium DB storage ($0.10+/GB/month) to hold blobs that a flat object store handles for a quarter of the price with better durability. The split lets each tier do what it's good at: the DB indexes & queries, S3 just serves bytes.This is the section that wins or loses the interview. We'll build the architecture in three passes: the simplest thing that could plausibly work, why it falls apart, and the production shape where every box justifies itself. Numbers from §3 drive every decision.
Sketch the simplest possible system: one app server talks to one MySQL database. Each paste is a row with a TEXT column holding the full content. To create, INSERT. To fetch, SELECT.
Three concrete failures emerge the moment real traffic shows up:
A 10MB paste sitting in a TEXT column means every SELECT on that row has to deserialize 10MB. Replication streams every change byte-for-byte — a single 10MB write produces 30MB of replication traffic across 3 replicas. Index pages get fragmented around the giant rows, slowing down every neighboring query.
At 36TB of pastes over 10 years, a full mysqldump takes days. Point-in-time recovery requires shipping every byte of every blob through the binlog. A storage layer optimized for tiny indexed rows is being asked to behave like a file server — the operational cost compounds every quarter.
Each read pulls a multi-KB to multi-MB blob through the DB connection. Even at a modest 58 reads/sec, that's hundreds of MB/sec of data flowing through a system that's tuned for thousands of small queries per second. The DB CPU is fine; the I/O bandwidth and connection memory are what melt.
The single most important insight in this design is that the small structured metadata and the large unstructured content have completely different shapes, and trying to store them together makes both worse. Split them into two planes that scale independently.
Tens of bytes per row, billions of rows. Workload: indexed lookups, joins to user/ACL tables, range scans for cleanup. Latency budget: under 10ms. Needs queries, indexes, transactions. What lives here: (url_hash → content_key, owner, expiry, visibility). Storage: MySQL or Cassandra.
Think of it as the library catalog — small index cards, tightly indexed by Dewey decimal, fast to flip through.
KB to MB per object, billions of objects. Workload: pure single-key GET / PUT, never queried by content. Latency budget: under 50ms. Needs cheap durable storage, no querying. What lives here: the actual paste text bytes, keyed by content_key. Storage: Amazon S3 (or any object store).
Think of it as the actual library shelves — fat books, sit on the shelf, fetched whole by call number.
What travels where, and why they scale differently: a single create generates one tiny metadata row (~80 bytes) and one big blob (up to 10MB). The metadata DB grows linearly in row count; the blob store grows linearly in bytes. They scale on totally different axes — sharding strategies, backup cadences, replication factors, even the cloud bills are different. Treating them as one thing is the original sin Pass 1 commits.
Now the full picture. Every node is numbered — find its matching card below to see what it does and crucially what would break without it.
Use the numbers in the diagram above to find the matching card. Each one answers what is this, why is it here, and what would break without it.
Anything that creates or fetches a paste — a browser hitting pastebin.com, a CLI tool like pbcopy | paste, an IDE plugin that uploads error output, a CI pipeline saving its build log. The client either POSTs text and gets back a URL, or GETs a URL and gets back text. From the client's view, the entire system is one URL.
Solves: nothing on its own — but every design choice flows backward from "what does the client experience?" Latency, durability, and the URL contract are all client-facing concerns.
The traffic cop. Sits in front of the app servers, distributes incoming HTTPS, terminates TLS, and yanks unhealthy backends out of rotation via health checks every few seconds. Round-robin is fine to start; switch to least-connections once a single 10MB upload can hog one pod's connection pool while sibling pods sit idle.
Solves: single-server bottleneck and single-server failure. Without an LB, one app crash takes down the whole service. With it, we lose 1/N of capacity for a few seconds until health checks fail the bad node out.
Stateless service that handles both write and read paths. Write flow: validate input → ask KGS for an unused key → PUT blob to S3 with that key → INSERT metadata row pointing to the blob → return the paste URL. Read flow: parse hash from URL → check metadata cache then DB → check block cache then S3 → return the text body. We start with a single combined tier; if write/read shapes diverge under load (writes hold 10MB request bodies in memory, reads stream out), we split into separate write/read tiers later.
Solves: the orchestration layer between the two planes. Without it the client would have to do "get a key, PUT the blob, write the metadata" itself — and any half-finished sequence (blob saved, metadata not) would orphan storage. The app server makes the create flow look atomic to the client.
A standalone service whose only job is to pre-generate random 6-character base64 keys and stockpile them in the Key DB. With 64 characters and 6 positions, the namespace is 64⁶ ≈ 68.7 billion keys — comfortably more than the ~3.65 billion pastes we'd create in 10 years. Write app servers ask KGS for a key, then immediately use it. Same exact mechanism as the URL shortener (see §7); we lift it wholesale.
Solves: the same write-path problem the URL shortener has. Without KGS, every create must do "generate hash → check DB for collision → retry if dup → insert", which gets expensive as the namespace fills. With KGS, the create path is one round-trip with zero collision risk by construction.
What if KGS dies? Single point of failure — solved by running an active-standby pair plus each app server caches ~100 keys locally. A 30-second KGS outage doesn't block creates.
Two tables: unused_keys (the pool waiting to be claimed) and used_keys (already assigned). When a paste expires and the cleanup service deletes it, its key returns to the unused pool for recycling. Storage is tiny — even 10M unused keys ahead of demand is well under a gigabyte. A simple key-value store works fine.
Solves: persisting the pool so KGS can restart without losing keys. If KGS held keys only in memory and crashed, every restart would lose 10K unallocated keys — acceptable, but persistence also lets us reserve a key for a specific user during multi-step flows ("draft a paste, publish later").
Source of truth for the small structured rows: (url_hash, content_key, owner, expiration, visibility, size_bytes). At ~80 bytes per row times ~3.65 billion rows over 10 years that's ~290 GB — easily handled by a sharded MySQL or a Cassandra cluster of 5-10 nodes. Sharded by url_hash via consistent hashing so reads go straight to the right shard. Each shard replicated 3× across availability zones for durability.
Solves: indexed lookups by url_hash in single-digit ms. Without it, an app server would have to know which S3 object belongs to a paste — but S3 has no concept of "paste expired" or "paste owner". The metadata DB carries every fact about the paste except its bytes.
The blob plane. Each paste's text is written as a single S3 object, keyed by content_key (a UUID, not the user-facing url_hash — that decoupling lets us re-key blobs during a future migration without touching user URLs). S3 gives us 11 nines of durability across 3 AZs, single-digit-ms first-byte latency for sub-MB objects, and storage costs roughly $0.023/GB/month — about a quarter of what equivalent DB storage would cost.
Solves: cheap durable storage of variable-sized blobs at any scale. Without it, the metadata DB would carry the bytes too — and we'd be back at Pass 1's failures: bloated rows, slow backups, replication storms.
An in-memory key-value store holding the hottest metadata rows. Tiny (~1GB even at high hit rates because rows are tiny), but eliminates 80% of the DB hits for popular pastes. Read flow: app server sends GET url_hash → cache returns the row in microseconds → app then knows where to fetch the blob. Eviction: LRU.
Solves: the metadata-DB read load. Without it every paste-fetch hits the DB just to find out where the blob lives — and a viral paste sees 1000s of metadata reads/sec all returning the same 80-byte row, which is wasted work.
The bigger sibling of the metadata cache — caches the actual paste content bytes. Sized at ~10GB to hold the hot 20% (per §3). For public pastes this can also be a CDN — a viral 100KB paste cached at edge POPs gets fetched by users in 20ms instead of the 80ms it takes to round-trip to S3 from origin. Eviction: LRU. Invalidation: on delete/expire only (pastes are immutable).
Solves: the S3 cost-and-latency problem for hot blobs. S3 charges per request; a viral paste at 1000 reads/sec costs real money in S3 GETs over a day. Caching it eliminates ~95% of those requests, slashing both the bill and the p50 latency.
A background worker that scans the metadata DB for expired entries (every paste has an expiration_date), deletes the matching S3 object, deletes the metadata row, and returns the freed key to the KGS unused pool. Runs in low-traffic windows. Also does lazy expiry on read — if an app server fetches an expired paste, it deletes both blob and row inline before returning 404.
Solves: preventing storage from growing unbounded. Without cleanup, our 10-year storage estimate would be 10-year accumulation forever — and S3, while cheap, is not free at 100TB+ of dead pastes nobody clicks.
Two real flows, mapped to the numbered components above:
{paste_data: "...50KB stack trace..."} to pastebin.com/api/v1/pastes.aZx9k2 from its in-memory pool and atomically marks it used in Key DB ⑤.content_key = "blob-7f3a...-9x" and PUTs the 50KB blob to S3 ⑦ at that key. S3 acknowledges only after replicating across 3 AZs.{url_hash: "aZx9k2", content_key: "blob-7f3a...-9x", user_id: 42, expiration: 2026-05-14, size: 51200, visibility: "PUBLIC"}.{paste_url: "https://pastebin.com/aZx9k2"}. Total elapsed: 80ms — most of it the S3 PUT.pastebin.com/aZx9k2.aZx9k2:
content_key.content_key.expiration_date — not expired ✓.blob-7f3a...-9x:
KGS in Pastebin is mechanically identical to KGS in the URL shortener. Rather than re-explain it from scratch, here's the short version with a pointer to the deeper discussion.
64⁶ = 68.7B namespace, vs ~3.65B pastes over 10 yearsunused_keys tableThe URL shortener page covers the encoding-vs-KGS trade-off, the collision math, the concurrency primitives (Redis atomic POP vs row-level SELECT FOR UPDATE), and the failure modes (what happens to a key handed out to an app that crashes before insert) in depth.
The two planes partition differently. The metadata plane we partition ourselves; the blob plane S3 partitions for us.
Compute shard = consistent_hash(url_hash) % N. Distribution is uniform by construction — no hot shards even if a paste goes viral, because viral reads hit the cache, not the DB. Use consistent hashing (not modulo) so adding/removing a shard only relocates 1/N of keys instead of all of them — no global rebalancing during scale-out.
S3 partitions objects by key prefix internally and rebalances automatically. Our only job is to choose content_key values that don't share long prefixes (UUID v4 keys are perfect — random first characters spread load uniformly across S3's internal partitions). We give S3 the keys; S3 worries about where to put them.
Metadata DB: each shard replicated to 3 nodes across 3 availability zones. Cassandra's R=2, W=2, N=3 gives writes acknowledged by 2 of 3 replicas (durable even if 1 zone dies) and reads from 2 replicas with conflict resolution. S3: built-in 11 nines of durability via cross-AZ erasure coding — we get this for free, no configuration.
Two caches, one per plane. Each has a different size and, surprisingly, a different optimal backing store.
Size: ~1 GB. Holds (url_hash → metadata row) for the active hot set. Eviction: LRU. Backing store: Memcached — purpose-built for tiny key-value lookups, microsecond latency, dead simple to operate. Hit rate target: 80%+.
What it solves: a viral paste with 1000 concurrent viewers would otherwise hammer the metadata DB with 1000 reads/sec returning the same 80-byte row. The cache absorbs all of them.
Size: ~10 GB (hot 20% of blob bytes). Holds (content_key → text body). Eviction: LRU. Backing store: Redis for private/auth-required pastes, or a CDN (CloudFront) for public pastes — the CDN gives us global edge caching for free.
What it solves: S3 GETs cost money per request and add 50-100ms of latency. For a viral paste, eliminating 95% of S3 GETs cuts both the bill and the p50 read latency dramatically.
Pastes are immutable by design (no edit feature) — the only invalidation event is delete or expire. The cleanup service explicitly evicts both caches when it removes a paste. No write-through dance required, which makes the cache layer a lot simpler than in a system with mutable content.
Each cache cluster is replicated 2× — losing one node means a few seconds of degraded hit rate as misses repopulate from DB/S3. Cold start: caches fill organically as requests come in; the first few minutes after a deploy see slightly higher backend load, then steady state.
Three LB tiers, each playing a slightly different role.
Public-facing LB (AWS ALB / nginx). Distributes incoming HTTPS across app pods. Health-checks every 5s, evicts unhealthy pods. Terminates TLS so backend pods don't pay the crypto cost. Handles WAF rules and per-IP rate limits.
Client-side or sidecar LB. Uses consistent hashing on the url_hash so the same paste always lands on the same cache node — maximizing hit rate. A simple round-robin would scatter the same paste across all cache nodes, killing the cache.
Cassandra/MySQL drivers route to the right shard themselves via the cluster topology. S3 handles its own request distribution — we just call the SDK. No LB to operate here.
Start with Round Robin — simple, no overhead, no state. Upgrade to Least Connections once you see one pod's CPU consistently higher than others (often because it caught a 10MB upload that's still streaming and never freed its connection slot). Smart LBs also do per-pod load reporting and shift traffic away from struggling nodes proactively.
Pastes have expiration_date — what happens when it passes? Two strategies, used together.
App server fetches a paste, sees expiration_date < now(), returns 404 to the user, and asynchronously: (a) DELETE the S3 object, (b) DELETE the metadata row, (c) return the key to the KGS unused pool. Pro: no scanning required, dead pastes get cleaned naturally as they get accessed. Con: expired-but-never-read pastes linger forever in storage.
The Cleanup Service ⑩ runs in a low-traffic window, scans the metadata DB for expiration_date < now(), and bulk-deletes the matching rows + S3 objects + recycles keys. Catches the long-tail of expired-but-unread pastes that lazy cleanup missed.
Always delete the S3 blob first, then the metadata row. If we delete the metadata first and crash before the S3 delete, we orphan a blob with no metadata — invisible garbage that never gets reclaimed. Deleting the blob first means a crash leaves a dangling metadata row pointing at a missing blob, which the next read will detect (404) and clean up. Garbage that announces itself is much better than garbage that doesn't.
Unlike URLs (which usually default to "never"), pastes default to expiring — most pastes are throwaway debug content that nobody needs after a week. Sensible defaults: 1 day for anonymous pastes, 1 month for logged-in users, "never" as an opt-in for paid tiers. Users can override per-paste up to a max of 1 year for free-tier accounts.
Pastes can carry sensitive content — leaked credentials, internal logs, half-finished code. Three layers of defense.
The metadata row carries a visibility column: PUBLIC (anyone with the URL can read), UNLISTED (only those who have the URL — same as PUBLIC but excluded from search/discovery), or PRIVATE (only the owner + a whitelist via an ACL table). On read, the app server checks the auth token against the ACL; mismatch returns 401, not 404, so a curious viewer knows the paste exists but can't see it.
KGS-generated keys are random 6-character base64, not sequential. An attacker can't enumerate /aaaaaa, /aaaaab, ... to discover private pastes — the keyspace is 68.7 billion, sparsely populated, so brute force is infeasible (you'd statistically need to try 20+ keys to find a single valid one). Combined with per-IP rate limiting on misses, scraping is impractical.
Per-API-key quotas — say 100 pastes/hour for free tier, 10MB cap each, 1000 reads/hour. Token bucket implemented in Redis. Anonymous creates require captcha. Without this, a single script could exhaust 10TB of S3 storage in an afternoon.
Submitted pastes go through async scanners — secret-detection (regex for AWS keys, GitHub tokens, etc.), malware signatures in attached scripts. Flagged pastes go into a moderation queue or are auto-redacted. Reactive: a "report abuse" link on every paste view; reported pastes get rate-limited and manually reviewed.
private by default, so even a leaked content_key alone can't reach the bytes — only the app server has signed-URL access.language: "java"). Doing it client-side keeps the server stateless and lets us add new language support without touching the backend. For the API, an optional ?format=html query param could return server-rendered HTML, with the rendered output cached at the CDN layer keyed on (paste_id, format).