← Back to Design & Development
High-Level Design

Pastebin — Share Text by URL

From "drop a stack trace into a database BLOB" to a two-plane system that keeps small metadata fast and large blobs cheap — the architecture, the trade-offs, and the production shape that earns every box

Read this with the framework in mind

This deep-dive applies the 4-step HLD interview framework. As you read, map each section to Requirements → Entities → APIs → High-Level Design → Deep Dives, and notice which of the 8 common patterns and key technologies are at play.

Framework → 8 Patterns → Tech Cheat Sheet →
Step 1

What is Pastebin?

It's 14:02. Raj is debugging a flaky deploy on his laptop and a 50KB Java stack trace explodes across his terminal. He needs to send it to Sarah on the SRE team. He could paste it into Slack — but Slack truncates after a few hundred lines and the indentation gets murdered. Instead Raj opens pastebin.com, pastes the trace into a text box, hits "Create" and gets back https://pastebin.com/aZx9k2. He drops the URL in chat. Sarah clicks it three minutes later and sees the same stack trace, formatted exactly as Raj pasted it. That round-trip is what we're designing.

Pastebin (and clones like Hastebin, GitHub Gist, Pastie) lets users save a chunk of plain text — code, config, log output, JSON dumps, error messages — and get back a unique, shareable URL. The text lives at that URL until it expires or is deleted. The defining trait: the URL returns the text content itself, not a redirect to somewhere else. That single difference, more than anything, drives the architecture away from a URL-shortener shape and into a two-plane storage system.

The two questions that drive every design decision below: (1) How do we generate a short, unique URL key for each paste without collisions? (2) How do we cheaply store billions of variable-sized blobs (1KB to 10MB each) and still serve any one of them in under 100ms?
Step 2

Requirements & Goals

Before drawing a single box, pin down what the system must do. In an interview, asking these questions out loud signals you know the difference between a 30-line CRUD app and a real distributed system.

✅ Functional Requirements

  • Users can upload (paste) plain text and get back a unique URL
  • Hitting that URL returns the original text, exactly as pasted
  • Pastes are text-only — no images, video, or binary uploads
  • Pastes expire after a default window; users can override the expiry
  • Users can optionally pick a custom alias for their paste URL
  • Users can delete a paste they own before expiry

⚙️ Non-Functional Requirements

  • Highly reliable — once a paste is saved, it must never silently disappear
  • Highly available — paste URLs are shared in chat, tickets, runbooks; broken means embarrassing
  • Real-time access with low latency — under 100ms p99 to fetch a paste
  • Unguessable links — random keys, not sequential, so attackers can't crawl the namespace

➕ Extended

  • Analytics — view counts per paste, geo, referrer
  • REST API for CLI tools, IDE plugins, CI pipelines to integrate

🚫 Out of Scope

  • Rich media (images, video, binary blobs)
  • Comments, reactions, threading
  • Versioning / paste history (treat each paste as immutable)
Reliability is the headline non-functional requirement. Unlike a URL shortener where the worst case is "this link redirects to nothing", Pastebin's worst case is "the actual content the user uploaded is gone forever". That makes durability — replicating blobs to multiple zones, never letting an ack precede a successful write — non-negotiable.
Step 3

Capacity Estimation & Constraints

Numbers drive every architectural choice. Pastebin is read-heavy but only modestly so: assume a 5:1 read-to-write ratio. People paste once and a small group of teammates each click the link a few times. (Compare to a URL shortener at 100:1, where one viral link gets clicked millions of times — Pastebin links are usually shared with 2-10 specific people.)

Traffic estimates

Assume 1 million new pastes per day. With a 5:1 read ratio, that's 5 million reads/day.

Writes

~12 pastes/sec

1M / 86400

Reads

~58 reads/sec

5M / 86400

Ingress

~120 KB/s

12 × 10KB avg

Egress

~0.6 MB/s

58 × 10KB avg

Storage estimate (10 years)

Average paste size: 10 KB (a typical stack trace, a few hundred lines of YAML, a short snippet). Per day: 1M × 10KB = ~10 GB/day. Over 10 years: 10 GB × 365 × 10 = ~36 TB. With 70% capacity headroom: ~51.4 TB provisioned.

Cache estimate (the 80/20 rule)

Roughly 20% of pastes draw 80% of reads — they're the ones still being shared in active threads. Daily read volume is 58 × 86400 = ~5M requests/day. To cache the hot 20% in memory: 0.2 × 5M × 10KB ≈ ~10 GB cache. Comfortably fits on a single Memcached/Redis box; we'll still replicate for fault tolerance.

MetricValueWhy it matters
New pastes/sec12/sDrives KGS throughput & metadata write rate
Reads/sec58/sDrives blob cache size and read replicas
10-yr storage36 TBForces blob-store split — too big for one DB
Hot cache10 GBJustifies a Memcached/Redis tier in front of S3
Egress0.6 MB/sTrivial — single LB tier handles it easily
Max paste size10 MBHard cap to prevent abuse and streaming complexity
Why cap individual pastes at 10MB? Without a limit, one user could upload a 1GB log file and force every read of that paste to stream a gigabyte through our app servers. 10MB covers 99.9% of real use cases (the longest real stack trace is rarely over 200KB), keeps memory bounded per request, and forces the abuse-case user toward a real file-sharing service like Dropbox.
Step 4

System APIs

Three endpoints — create, fetch, delete. Defining APIs early locks down the contract before architecture. Note the key difference from a URL shortener: GET returns text content directly, not a 302 redirect.

REST API surface
// Create — write path, low QPS, payload up to 10MB
POST /api/v1/pastes
{
  "api_dev_key":  "abc123...",
  "paste_data":   "Exception in thread \"main\" java.lang.NullPointerException...",
  "paste_name":   "deploy-failure-2026-05-07",   // optional
  "custom_url":   "deploy-bug",                  // optional
  "expire_date":  "2026-05-14T00:00:00Z"         // optional
}
→ 201 Created  { "paste_url": "https://pastebin.com/aZx9k2" }

// Fetch — read path, returns raw text content (not a redirect)
GET /:paste_key
→ 200 OK
   Content-Type: text/plain; charset=utf-8
   Body: <the original paste content>

// Delete — owner only
DELETE /:paste_key
Headers: { "X-API-Dev-Key": "abc123..." }
→ 204 No Content
Why GET returns content directly, not a redirect: a URL shortener's job is to point you elsewhere; Pastebin's job is to be the destination. A 302 to S3 would leak our internal storage URL, lose us analytics on every read after the first, and bypass any future ACL checks. Returning the body lets the app server enforce auth, log the view, and apply rate-limits before a single byte leaves the building.
Abuse prevention via api_dev_key: a malicious script could spam a million 10MB pastes and burn through our storage budget in a single afternoon. Every create is tied to a developer key that's rate-limited (e.g., 100 pastes/hour per free key, 10MB cap each). Anonymous users go through a captcha-gated UI flow and a per-IP quota.
Step 5

Database Design

Look at the data and ask: what shape is it? Two very different shapes hide inside one paste. The metadata — paste_id, owner, expiry, creation time — is small (tens of bytes), structured, queryable, and tightly indexed. The content — the actual pasted text — is large (KB to MB), unstructured, and only ever fetched whole by primary key. Forcing both shapes into the same store is the central mistake we'll fix in the next section. Here we just acknowledge the split in the schema.

erDiagram PASTE { string url_hash PK string content_key timestamp expiration_date timestamp creation_date bigint user_id FK int size_bytes string visibility } USER { bigint id PK string name string email timestamp creation_date timestamp last_login } USER ||--o{ PASTE : "creates"

Two-tier storage — the key design choice

The PASTE row is small (~80 bytes) and lives in a relational or wide-column metadata DB (MySQL/Cassandra). The content_key column is a pointer into object storage (Amazon S3 or equivalent), where the actual paste text lives as a single object. When Sarah opens aZx9k2, the app fetches the metadata row by url_hash, reads the content_key, then fetches the blob from S3.

📋 Metadata DB

Holds the small, queryable rows: who owns this paste, when does it expire, what's its content_key, is it public or private. Workload: indexed lookups by url_hash, occasional range scans for cleanup. Either MySQL (sharded) or Cassandra works — Cassandra wins at the multi-million-row scale because of automatic sharding.

🪣 Blob Store (S3)

Holds the large, opaque content blobs keyed by content_key. Workload: single-key GET / PUT, never queried by content. S3 is purpose-built for this — 11 nines of durability across multiple zones, costs roughly $0.023/GB/month for Standard, single-digit-ms latency for objects under a few MB.

Why split the storage? A single MySQL row has practical limits — TEXT columns above a few MB cause replication lag, clog backups, and turn every SELECT * into a memory hog. Worse, you'd be paying premium DB storage ($0.10+/GB/month) to hold blobs that a flat object store handles for a quarter of the price with better durability. The split lets each tier do what it's good at: the DB indexes & queries, S3 just serves bytes.
Step 6 · CORE

High-Level Architecture — From Naive to Production

This is the section that wins or loses the interview. We'll build the architecture in three passes: the simplest thing that could plausibly work, why it falls apart, and the production shape where every box justifies itself. Numbers from §3 drive every decision.

Pass 1 — The naive design (and why it breaks)

Sketch the simplest possible system: one app server talks to one MySQL database. Each paste is a row with a TEXT column holding the full content. To create, INSERT. To fetch, SELECT.

flowchart LR C["Client"] --> APP["App Server"] APP --> DB[("MySQL — metadata + blob")]

Three concrete failures emerge the moment real traffic shows up:

💥 Big rows wreck the database

A 10MB paste sitting in a TEXT column means every SELECT on that row has to deserialize 10MB. Replication streams every change byte-for-byte — a single 10MB write produces 30MB of replication traffic across 3 replicas. Index pages get fragmented around the giant rows, slowing down every neighboring query.

💥 Backups become a nightmare

At 36TB of pastes over 10 years, a full mysqldump takes days. Point-in-time recovery requires shipping every byte of every blob through the binlog. A storage layer optimized for tiny indexed rows is being asked to behave like a file server — the operational cost compounds every quarter.

💥 Read traffic saturates the DB

Each read pulls a multi-KB to multi-MB blob through the DB connection. Even at a modest 58 reads/sec, that's hundreds of MB/sec of data flowing through a system that's tuned for thousands of small queries per second. The DB CPU is fine; the I/O bandwidth and connection memory are what melt.

Pass 2 — The mental model: Metadata Plane vs. Blob Plane

The single most important insight in this design is that the small structured metadata and the large unstructured content have completely different shapes, and trying to store them together makes both worse. Split them into two planes that scale independently.

📋 Metadata Plane

Tens of bytes per row, billions of rows. Workload: indexed lookups, joins to user/ACL tables, range scans for cleanup. Latency budget: under 10ms. Needs queries, indexes, transactions. What lives here: (url_hash → content_key, owner, expiry, visibility). Storage: MySQL or Cassandra.

Think of it as the library catalog — small index cards, tightly indexed by Dewey decimal, fast to flip through.

🪣 Blob Plane

KB to MB per object, billions of objects. Workload: pure single-key GET / PUT, never queried by content. Latency budget: under 50ms. Needs cheap durable storage, no querying. What lives here: the actual paste text bytes, keyed by content_key. Storage: Amazon S3 (or any object store).

Think of it as the actual library shelves — fat books, sit on the shelf, fetched whole by call number.

What travels where, and why they scale differently: a single create generates one tiny metadata row (~80 bytes) and one big blob (up to 10MB). The metadata DB grows linearly in row count; the blob store grows linearly in bytes. They scale on totally different axes — sharding strategies, backup cadences, replication factors, even the cloud bills are different. Treating them as one thing is the original sin Pass 1 commits.

So what changes: the central architectural idea is the two-plane split. Everything else — KGS, caches, load balancers — is the supporting cast that makes the two planes work together.

Pass 3 — The production shape

Now the full picture. Every node is numbered — find its matching card below to see what it does and crucially what would break without it.

flowchart TB CL["① Client — Web · CLI · SDK"] subgraph EDGE["Edge Tier"] LB["② Load Balancer"] end subgraph APPTIER["Application Tier"] APP["③ Application Server"] end subgraph KEYTIER["Key Generation"] KGS["④ Key Generation Service"] KDB[("⑤ Key DB — unused + used pools")] end subgraph META["Metadata Plane"] MDB[("⑥ Metadata DB — MySQL or Cassandra")] MCACHE["⑧ Metadata Cache — Memcached"] end subgraph BLOB["Blob Plane"] OBJ[("⑦ Object Storage — Amazon S3")] BCACHE["⑨ Block Cache — Redis or CDN"] end subgraph OPS["Async Ops"] CLEAN["⑩ Cleanup Service"] end CL --> LB LB --> APP APP --> KGS KGS --> KDB APP --> MCACHE MCACHE -.miss.-> MDB APP --> BCACHE BCACHE -.miss.-> OBJ APP --> MDB APP --> OBJ CLEAN -.scans.-> MDB CLEAN -.deletes.-> OBJ CLEAN -.recycles keys.-> KDB style CL fill:#e8743b,stroke:#e8743b,color:#fff style LB fill:#171d27,stroke:#9b72cf,color:#d4dae5 style APP fill:#171d27,stroke:#e8743b,color:#d4dae5 style KGS fill:#171d27,stroke:#9b72cf,color:#d4dae5 style KDB fill:#171d27,stroke:#9b72cf,color:#d4dae5 style MDB fill:#171d27,stroke:#4a90d9,color:#d4dae5 style OBJ fill:#171d27,stroke:#38b265,color:#d4dae5 style MCACHE fill:#171d27,stroke:#3cbfbf,color:#d4dae5 style BCACHE fill:#171d27,stroke:#3cbfbf,color:#d4dae5 style CLEAN fill:#171d27,stroke:#d4a838,color:#d4dae5

Component-by-component — what each numbered box does

Use the numbers in the diagram above to find the matching card. Each one answers what is this, why is it here, and what would break without it.

Client

Anything that creates or fetches a paste — a browser hitting pastebin.com, a CLI tool like pbcopy | paste, an IDE plugin that uploads error output, a CI pipeline saving its build log. The client either POSTs text and gets back a URL, or GETs a URL and gets back text. From the client's view, the entire system is one URL.

Solves: nothing on its own — but every design choice flows backward from "what does the client experience?" Latency, durability, and the URL contract are all client-facing concerns.

Load Balancer

The traffic cop. Sits in front of the app servers, distributes incoming HTTPS, terminates TLS, and yanks unhealthy backends out of rotation via health checks every few seconds. Round-robin is fine to start; switch to least-connections once a single 10MB upload can hog one pod's connection pool while sibling pods sit idle.

Solves: single-server bottleneck and single-server failure. Without an LB, one app crash takes down the whole service. With it, we lose 1/N of capacity for a few seconds until health checks fail the bad node out.

Application Server

Stateless service that handles both write and read paths. Write flow: validate input → ask KGS for an unused key → PUT blob to S3 with that key → INSERT metadata row pointing to the blob → return the paste URL. Read flow: parse hash from URL → check metadata cache then DB → check block cache then S3 → return the text body. We start with a single combined tier; if write/read shapes diverge under load (writes hold 10MB request bodies in memory, reads stream out), we split into separate write/read tiers later.

Solves: the orchestration layer between the two planes. Without it the client would have to do "get a key, PUT the blob, write the metadata" itself — and any half-finished sequence (blob saved, metadata not) would orphan storage. The app server makes the create flow look atomic to the client.

Key Generation Service (KGS)

A standalone service whose only job is to pre-generate random 6-character base64 keys and stockpile them in the Key DB. With 64 characters and 6 positions, the namespace is 64⁶ ≈ 68.7 billion keys — comfortably more than the ~3.65 billion pastes we'd create in 10 years. Write app servers ask KGS for a key, then immediately use it. Same exact mechanism as the URL shortener (see §7); we lift it wholesale.

Solves: the same write-path problem the URL shortener has. Without KGS, every create must do "generate hash → check DB for collision → retry if dup → insert", which gets expensive as the namespace fills. With KGS, the create path is one round-trip with zero collision risk by construction.

What if KGS dies? Single point of failure — solved by running an active-standby pair plus each app server caches ~100 keys locally. A 30-second KGS outage doesn't block creates.

Key DB (KGS storage)

Two tables: unused_keys (the pool waiting to be claimed) and used_keys (already assigned). When a paste expires and the cleanup service deletes it, its key returns to the unused pool for recycling. Storage is tiny — even 10M unused keys ahead of demand is well under a gigabyte. A simple key-value store works fine.

Solves: persisting the pool so KGS can restart without losing keys. If KGS held keys only in memory and crashed, every restart would lose 10K unallocated keys — acceptable, but persistence also lets us reserve a key for a specific user during multi-step flows ("draft a paste, publish later").

Metadata DB (MySQL / Cassandra)

Source of truth for the small structured rows: (url_hash, content_key, owner, expiration, visibility, size_bytes). At ~80 bytes per row times ~3.65 billion rows over 10 years that's ~290 GB — easily handled by a sharded MySQL or a Cassandra cluster of 5-10 nodes. Sharded by url_hash via consistent hashing so reads go straight to the right shard. Each shard replicated 3× across availability zones for durability.

Solves: indexed lookups by url_hash in single-digit ms. Without it, an app server would have to know which S3 object belongs to a paste — but S3 has no concept of "paste expired" or "paste owner". The metadata DB carries every fact about the paste except its bytes.

Object Storage (Amazon S3)

The blob plane. Each paste's text is written as a single S3 object, keyed by content_key (a UUID, not the user-facing url_hash — that decoupling lets us re-key blobs during a future migration without touching user URLs). S3 gives us 11 nines of durability across 3 AZs, single-digit-ms first-byte latency for sub-MB objects, and storage costs roughly $0.023/GB/month — about a quarter of what equivalent DB storage would cost.

Solves: cheap durable storage of variable-sized blobs at any scale. Without it, the metadata DB would carry the bytes too — and we'd be back at Pass 1's failures: bloated rows, slow backups, replication storms.

Metadata Cache (Memcached)

An in-memory key-value store holding the hottest metadata rows. Tiny (~1GB even at high hit rates because rows are tiny), but eliminates 80% of the DB hits for popular pastes. Read flow: app server sends GET url_hash → cache returns the row in microseconds → app then knows where to fetch the blob. Eviction: LRU.

Solves: the metadata-DB read load. Without it every paste-fetch hits the DB just to find out where the blob lives — and a viral paste sees 1000s of metadata reads/sec all returning the same 80-byte row, which is wasted work.

Block Cache (Redis or CDN)

The bigger sibling of the metadata cache — caches the actual paste content bytes. Sized at ~10GB to hold the hot 20% (per §3). For public pastes this can also be a CDN — a viral 100KB paste cached at edge POPs gets fetched by users in 20ms instead of the 80ms it takes to round-trip to S3 from origin. Eviction: LRU. Invalidation: on delete/expire only (pastes are immutable).

Solves: the S3 cost-and-latency problem for hot blobs. S3 charges per request; a viral paste at 1000 reads/sec costs real money in S3 GETs over a day. Caching it eliminates ~95% of those requests, slashing both the bill and the p50 latency.

Cleanup Service

A background worker that scans the metadata DB for expired entries (every paste has an expiration_date), deletes the matching S3 object, deletes the metadata row, and returns the freed key to the KGS unused pool. Runs in low-traffic windows. Also does lazy expiry on read — if an app server fetches an expired paste, it deletes both blob and row inline before returning 404.

Solves: preventing storage from growing unbounded. Without cleanup, our 10-year storage estimate would be 10-year accumulation forever — and S3, while cheap, is not free at 100TB+ of dead pastes nobody clicks.

Concrete walkthrough — Raj pastes, Sarah opens

Two real flows, mapped to the numbered components above:

✍️ Write flow — Raj uploads a 50KB stack trace at 14:02

  1. Raj's browser ① POSTs {paste_data: "...50KB stack trace..."} to pastebin.com/api/v1/pastes.
  2. The Load Balancer ② routes it to an Application Server ③.
  3. App server validates: payload < 10MB ✓, api_dev_key under quota ✓.
  4. App server calls KGS ④: "give me a key" → KGS pops aZx9k2 from its in-memory pool and atomically marks it used in Key DB ⑤.
  5. App server generates a UUID content_key = "blob-7f3a...-9x" and PUTs the 50KB blob to S3 ⑦ at that key. S3 acknowledges only after replicating across 3 AZs.
  6. App server INSERTs into Metadata DB ⑥: {url_hash: "aZx9k2", content_key: "blob-7f3a...-9x", user_id: 42, expiration: 2026-05-14, size: 51200, visibility: "PUBLIC"}.
  7. App server returns {paste_url: "https://pastebin.com/aZx9k2"}. Total elapsed: 80ms — most of it the S3 PUT.

📖 Read flow — Sarah opens the URL 3 hours later

  1. Sarah's browser ① GETs pastebin.com/aZx9k2.
  2. Load Balancer ② → Application Server ③ (any pod, stateless).
  3. App server checks Metadata Cache ⑧ for aZx9k2:
    • Hit (Sarah is the 4th viewer in 3 hours, paste is hot): cache returns the row → app server reads content_key.
    • Miss: app server queries Metadata DB ⑥, populates cache, reads content_key.
  4. App server checks expiration_date — not expired ✓.
  5. App server checks Block Cache ⑨ for blob-7f3a...-9x:
    • Hit: returns the 50KB body in ~25ms total. Done.
    • Miss: app server fetches the blob from S3 ⑦, populates the block cache, returns the body in ~80ms total.
  6. App server emits an async view event for analytics. Browser renders the text.
So what: Pastebin keeps fast and cheap by never letting the metadata and blob workloads compete. Tiny indexed rows live in the metadata plane (DB + Memcached), where queries fly. Large opaque bytes live in the blob plane (S3 + a block cache), where storage is cheap and durable. Every other component — KGS, LB, cleanup — exists to make the two planes work together as if they were one. That's the architecture in one breath.
Step 7

Key Generation Service — Brief Recap

KGS in Pastebin is mechanically identical to KGS in the URL shortener. Rather than re-explain it from scratch, here's the short version with a pointer to the deeper discussion.

sequenceDiagram participant W as App Server participant K as KGS participant KDB as Key DB participant U as Metadata DB Note over K,KDB: KGS pre-generates batches of keys in background K->>KDB: Generate 10K random keys, INSERT into unused KDB-->>K: OK W->>K: Give me a key K->>KDB: Atomically move 1 key from unused to used KDB-->>K: aZx9k2 K-->>W: aZx9k2 W->>U: INSERT (aZx9k2, content_key, ...) U-->>W: OK

📝 The KGS recipe

  • 6-char base64 keys64⁶ = 68.7B namespace, vs ~3.65B pastes over 10 years
  • KGS generates random keys in the background and stores them in the unused_keys table
  • App servers cache batches of ~100 keys locally for burst tolerance
  • Atomic POP from the pool — concurrent app servers can never get the same key
  • Active-standby KGS pair eliminates the single point of failure
  • Expired-paste cleanup recycles keys back into the unused pool

📚 Want the full discussion?

The URL shortener page covers the encoding-vs-KGS trade-off, the collision math, the concurrency primitives (Redis atomic POP vs row-level SELECT FOR UPDATE), and the failure modes (what happens to a key handed out to an app that crashes before insert) in depth.

→ Read the full KGS section in the URL shortener HLD

The reuse insight: any system that needs short, unguessable, unique keys at scale benefits from the same KGS pattern — Pastebin, URL shorteners, share-link generators, password reset tokens. Build it once as a service; reuse the contract everywhere.
Step 8

Data Partitioning & Replication

The two planes partition differently. The metadata plane we partition ourselves; the blob plane S3 partitions for us.

📋 Metadata DB — hash-based with consistent hashing

Compute shard = consistent_hash(url_hash) % N. Distribution is uniform by construction — no hot shards even if a paste goes viral, because viral reads hit the cache, not the DB. Use consistent hashing (not modulo) so adding/removing a shard only relocates 1/N of keys instead of all of them — no global rebalancing during scale-out.

🪣 Object Storage — S3 handles it for us

S3 partitions objects by key prefix internally and rebalances automatically. Our only job is to choose content_key values that don't share long prefixes (UUID v4 keys are perfect — random first characters spread load uniformly across S3's internal partitions). We give S3 the keys; S3 worries about where to put them.

Replication strategy

Metadata DB: each shard replicated to 3 nodes across 3 availability zones. Cassandra's R=2, W=2, N=3 gives writes acknowledged by 2 of 3 replicas (durable even if 1 zone dies) and reads from 2 replicas with conflict resolution. S3: built-in 11 nines of durability via cross-AZ erasure coding — we get this for free, no configuration.

Why this two-tier partitioning is the win: we get the queryability of a sharded DB for the small data and the operational simplicity of a managed object store for the huge data. The DB cluster stays small (a dozen nodes for 290GB of metadata is wildly underutilized) and S3 absorbs the petabyte-scale future without any planning on our part.
Step 9

Cache

Two caches, one per plane. Each has a different size and, surprisingly, a different optimal backing store.

📋 Metadata Cache — small, hot rows

Size: ~1 GB. Holds (url_hash → metadata row) for the active hot set. Eviction: LRU. Backing store: Memcached — purpose-built for tiny key-value lookups, microsecond latency, dead simple to operate. Hit rate target: 80%+.

What it solves: a viral paste with 1000 concurrent viewers would otherwise hammer the metadata DB with 1000 reads/sec returning the same 80-byte row. The cache absorbs all of them.

🪣 Block Cache — bigger, hot blobs

Size: ~10 GB (hot 20% of blob bytes). Holds (content_key → text body). Eviction: LRU. Backing store: Redis for private/auth-required pastes, or a CDN (CloudFront) for public pastes — the CDN gives us global edge caching for free.

What it solves: S3 GETs cost money per request and add 50-100ms of latency. For a viral paste, eliminating 95% of S3 GETs cuts both the bill and the p50 read latency dramatically.

Invalidation

Pastes are immutable by design (no edit feature) — the only invalidation event is delete or expire. The cleanup service explicitly evicts both caches when it removes a paste. No write-through dance required, which makes the cache layer a lot simpler than in a system with mutable content.

Replication & warm-up

Each cache cluster is replicated 2× — losing one node means a few seconds of degraded hit rate as misses repopulate from DB/S3. Cold start: caches fill organically as requests come in; the first few minutes after a deploy see slightly higher backend load, then steady state.

Step 10

Load Balancer

Three LB tiers, each playing a slightly different role.

① Client → App tier

Public-facing LB (AWS ALB / nginx). Distributes incoming HTTPS across app pods. Health-checks every 5s, evicts unhealthy pods. Terminates TLS so backend pods don't pay the crypto cost. Handles WAF rules and per-IP rate limits.

② App → Cache

Client-side or sidecar LB. Uses consistent hashing on the url_hash so the same paste always lands on the same cache node — maximizing hit rate. A simple round-robin would scatter the same paste across all cache nodes, killing the cache.

③ App → DB / S3

Cassandra/MySQL drivers route to the right shard themselves via the cluster topology. S3 handles its own request distribution — we just call the SDK. No LB to operate here.

Algorithm choice

Start with Round Robin — simple, no overhead, no state. Upgrade to Least Connections once you see one pod's CPU consistently higher than others (often because it caught a 10MB upload that's still streaming and never freed its connection slot). Smart LBs also do per-pod load reporting and shift traffic away from struggling nodes proactively.

The LB is itself a single point of failure — solved with an active-passive pair using virtual IP failover, or natively in cloud LBs (AWS ALB is multi-AZ by default).
Step 11

Purging & Cleanup

Pastes have expiration_date — what happens when it passes? Two strategies, used together.

🐢 Lazy cleanup — delete on first read after expiry

App server fetches a paste, sees expiration_date < now(), returns 404 to the user, and asynchronously: (a) DELETE the S3 object, (b) DELETE the metadata row, (c) return the key to the KGS unused pool. Pro: no scanning required, dead pastes get cleaned naturally as they get accessed. Con: expired-but-never-read pastes linger forever in storage.

🧹 Active cleanup — nightly batch sweep

The Cleanup Service ⑩ runs in a low-traffic window, scans the metadata DB for expiration_date < now(), and bulk-deletes the matching rows + S3 objects + recycles keys. Catches the long-tail of expired-but-unread pastes that lazy cleanup missed.

Two-step delete is critical

Always delete the S3 blob first, then the metadata row. If we delete the metadata first and crash before the S3 delete, we orphan a blob with no metadata — invisible garbage that never gets reclaimed. Deleting the blob first means a crash leaves a dangling metadata row pointing at a missing blob, which the next read will detect (404) and clean up. Garbage that announces itself is much better than garbage that doesn't.

Default expiry policy

Unlike URLs (which usually default to "never"), pastes default to expiring — most pastes are throwaway debug content that nobody needs after a week. Sensible defaults: 1 day for anonymous pastes, 1 month for logged-in users, "never" as an opt-in for paid tiers. Users can override per-paste up to a max of 1 year for free-tier accounts.

Step 12

Security & Permissions

Pastes can carry sensitive content — leaked credentials, internal logs, half-finished code. Three layers of defense.

🔐 Public vs Private pastes

The metadata row carries a visibility column: PUBLIC (anyone with the URL can read), UNLISTED (only those who have the URL — same as PUBLIC but excluded from search/discovery), or PRIVATE (only the owner + a whitelist via an ACL table). On read, the app server checks the auth token against the ACL; mismatch returns 401, not 404, so a curious viewer knows the paste exists but can't see it.

🔒 Unguessable URLs

KGS-generated keys are random 6-character base64, not sequential. An attacker can't enumerate /aaaaaa, /aaaaab, ... to discover private pastes — the keyspace is 68.7 billion, sparsely populated, so brute force is infeasible (you'd statistically need to try 20+ keys to find a single valid one). Combined with per-IP rate limiting on misses, scraping is impractical.

🚫 Rate limiting per api_dev_key

Per-API-key quotas — say 100 pastes/hour for free tier, 10MB cap each, 1000 reads/hour. Token bucket implemented in Redis. Anonymous creates require captcha. Without this, a single script could exhaust 10TB of S3 storage in an afternoon.

🛡️ Abuse / phishing

Submitted pastes go through async scanners — secret-detection (regex for AWS keys, GitHub tokens, etc.), malware signatures in attached scripts. Flagged pastes go into a moderation queue or are auto-redacted. Reactive: a "report abuse" link on every paste view; reported pastes get rate-limited and manually reviewed.

Why visibility lives on the metadata row, not the S3 object: S3's per-object ACLs are slow to update and clunky to manage at billion-object scale. Far simpler to enforce permissions in the app server on every read — the app already has to fetch metadata before the blob, so the ACL check is one extra in-memory comparison. Defense in depth: S3 buckets are still private by default, so even a leaked content_key alone can't reach the bytes — only the app server has signed-URL access.
Step 13

Interview Q&A

Why split metadata and blob storage instead of one DB?
Different shapes, different stores. Metadata rows are tens of bytes, indexed, queried by url_hash. Blobs are KB to MB, opaque, fetched whole by content_key. Forcing both into one MySQL table means giant TEXT columns wreck replication and backups, while the small structured rows underuse the DB's indexing power. Splitting lets each tier do what it's good at — the DB indexes, S3 just serves bytes — and the cost is a single extra round-trip per read (which the metadata cache eliminates 80% of the time).
Why S3 (or any object store) instead of a filesystem on the app servers?
Durability, scale, and statelessness. A filesystem on the app server makes the app server stateful — kill the box and you lose the pastes that were on its disk. Replicating the filesystem yourself means re-implementing what S3 already does: cross-AZ replication, erasure coding, automatic rebalancing. S3 gives 11 nines of durability out of the box for $0.023/GB/month. The app servers stay stateless and become trivially horizontally scalable.
How do you handle 10MB pastes vs 1KB pastes — same code path?
Same metadata path, different blob path. The metadata row is the same size (~80 bytes) regardless of paste size. For S3, sub-5MB blobs use a single PUT; multipart upload kicks in above 5MB to allow resumable uploads and parallel chunk transfer. On read, sub-1MB blobs return as one body; larger blobs are streamed back to the client to keep app-server memory bounded. The 10MB hard cap exists specifically so we never have to design for 1GB pastes — that's a different system (file sharing).
How do you handle a viral paste getting 10K reads/sec?
The two caches absorb it entirely. A viral paste is by definition "hot" — its metadata row sits in the metadata cache, its blob sits in the block cache (or even the CDN if public). Origin S3 sees zero traffic from it. Origin DB sees zero traffic from it. The only choke point is bandwidth out of the cache/CDN, which both are explicitly designed for. View-count telemetry is async via Kafka, which buffers the spike without dropping events or melting a DB row.
What's the trade-off of a 5:1 read:write ratio (vs URL shortener's 100:1)?
The cache layer earns less. In a URL shortener, 100 reads-per-write means caching is wildly cost-effective — one cached entry pays for itself across hundreds of reads. In Pastebin at 5:1, a cached paste pays back across only 5 reads, so cache hit rate matters less. We still cache (because the bytes are bigger and S3 GETs cost money), but we don't need to obsess over edge POPs the way a shortener does. However: writes are 100× larger (10KB avg vs 100 bytes), so write durability and S3 PUT latency become the headline concerns instead of read fan-out.
How would you add syntax highlighting?
Client-side, async. The server returns plain text — never pre-rendered HTML. The browser loads a syntax-highlighting library (Prism, Highlight.js) and detects the language from the content or a user-selected hint stored in the metadata row (language: "java"). Doing it client-side keeps the server stateless and lets us add new language support without touching the backend. For the API, an optional ?format=html query param could return server-rendered HTML, with the rendered output cached at the CDN layer keyed on (paste_id, format).
What if KGS goes down?
Three layers of defense. (1) Active-standby KGS pair with failover in seconds. (2) Each app server caches ~100 unallocated keys locally — a 30-second KGS outage means creates still succeed from local cache. (3) If both fail, app servers degrade to inline random-key generation with collision-check fallback (slower, occasional retries, but doesn't drop creates). Same playbook as the URL shortener — see that page for the deeper discussion.
How do you prevent storage cost from running away?
Three knobs. (1) Hard size cap at 10MB per paste, enforced before S3 PUT. (2) Default expiry — anonymous pastes expire in 1 day, free-tier in 1 month; only paid users get permanent. (3) Active cleanup nightly + lazy cleanup on expired reads ensures expired blobs actually leave S3. (4) Tiered storage: pastes not read in 90 days transition automatically from S3 Standard to S3 Infrequent Access, cutting their per-GB cost by ~70% with no code change.
The one-line summary the interviewer remembers: "It's a two-plane storage system — small structured metadata in a sharded DB fronted by Memcached, large blob bytes in S3 fronted by a block cache, glued together by a stateless app tier and a Key Generation Service that pre-mints unique 6-char URL keys offline. Metadata stays fast, blobs stay cheap, neither competes with the other."