The vocabulary that turns "I'll add a cache" into "I'll store user_id → session_blob in a Redis HASH with 24h TTL, sharded across 4 nodes via consistent hashing"
Two candidates sit through the same 45-minute system design loop. Both reach the part of the interview where caching comes up. Candidate A says "I'll add a cache in front of the database to speed up reads." Candidate B says "I'll cache the top 10K trending video metadata as video:{id} Redis HASHes, keep them in a 6-node Redis Cluster sharded by video_id, set LRU eviction at 64GB per node, give each entry a 30-minute TTL with explicit invalidation on transcode-complete events." Both candidates are correct. Only one gets the offer.
The interviewer is not testing whether you know the word "cache". They are testing whether you have operated one — whether you've stared at a Grafana dashboard at 3am wondering why your hit rate dropped, whether you've picked a sharding key that turned out to be hot, whether you've explained to a PM why "just clear the cache" deletes a billion-dollar revenue stream. That experience shows up as specificity: the data structure, the eviction policy, the sharding scheme, the replication topology, the failure mode, the metric you'd watch.
Every technology below uses the same five fields so you can scan and compare.
The first decision in almost every HLD: what's the source of truth? Pick the wrong DB family and every other choice gets harder. The four below cover ~90% of interview answers.
WhenYou need ACID transactions, multi-table joins, foreign-key constraints, or strong consistency. Default choice for any system that touches money, identity, inventory, or anything where "lost write = lawsuit".
FeaturesB-tree and hash indexes, MVCC for concurrent reads without blocking, FK constraints, partial indexes (WHERE status='active'), expression indexes, materialized views, JSONB column type (Postgres), GIN/GiST indexes for full-text and geo, logical replication for streaming changes to read replicas.
PatternsMaster + read replicas (read-heavy workloads). Connection pooling via PgBouncer. SELECT FOR UPDATE for row-level pessimistic locks. Logical replication into Kafka via Debezium for change-data-capture. Vertical partitioning (separate tables for hot vs cold columns).
Trade-offsVertical scaling ceiling — a single Postgres box maxes out around 5K-10K complex writes/sec on commodity hardware. Sharding is hard (no native multi-shard transactions). Schema migrations on huge tables can lock for hours without careful tooling.
Interview TipDrop these terms: isolation levels ("READ COMMITTED is the default; bump to SERIALIZABLE for the inventory check"), SELECT FOR UPDATE for pessimistic row locks, partial index on WHERE deleted_at IS NULL, GIN index for full-text or JSONB, logical replication for read replicas without blocking the master.
WhenWrite-heavy workloads where the access pattern is "give me all the rows for this partition key". Time-series data (sensor readings, event logs), user activity feeds, message inboxes, anything that's append-mostly and queried by a known key.
FeaturesTunable consistency per query (R + W > N for strong reads, R=1, W=1 for fastest), wide columns (a row can have millions of columns), partition key + clustering key data model, LSM-tree storage (write-optimized), peer-to-peer architecture (no master), multi-datacenter replication out of the box.
PatternsDesign table per query. Partition key = the thing you'll always filter by (user_id). Clustering key = how rows within a partition are ordered (created_at DESC). For "give me Sarah's last 50 messages": PRIMARY KEY ((user_id), created_at). Replication factor of 3 across 3 AZs is standard.
Trade-offsNo joins, ever. Eventually consistent by default (you ask for stronger consistency per query). Query patterns must be designed up front — if you don't know how you'll query the data, Cassandra will hurt you. Compaction is operationally expensive at scale.
Interview Tip"Design partition key around the access pattern; never query without the partition key — otherwise it's a scatter-gather across every node and that's a query you cancel before it finishes." Mention R + W > N quorum for strong consistency, hinted handoff for transient node failures, SSTable compaction as the ops headache.
WhenYou're on AWS, want predictable single-digit-ms latency at any scale, and don't want to operate a database. Session stores, shopping carts, IoT device state, anything with a known access pattern that scales horizontally.
FeaturesManaged sharding (you never touch it), Global Secondary Indexes (GSIs) for querying on alternative keys, DynamoDB Streams for change-data-capture, Global Tables for multi-region active-active, on-demand vs provisioned capacity modes, point-in-time recovery, single-digit-ms p99 reads.
PatternsSingle-table design — pack multiple entity types into one table, distinguished by a generic PK + SK with composed keys like USER#42 + ORDER#2024-01-15. GSIs duplicate data with different keys to support secondary access patterns. Streams + Lambda for derived views.
Trade-offsQuery flexibility limited by your GSI design — every access pattern you didn't plan for becomes a full table scan or a new GSI. Hot partition cost (single PK getting all the traffic = throttled). Expensive at scale vs self-managed Cassandra. AWS lock-in.
Interview TipAlways say "I'll model the access patterns first, then derive the partition key + sort key + GSIs." For each access pattern, write the exact PK / SK combo. Example: "GetUserPosts" → PK=USER#{id}, SK=POST#{created_at}; "BrowseByCategory" → GSI with PK=CATEGORY#{cat}, SK=created_at.
WhenSchema is flexible and changing rapidly (early-stage product, content management), data is naturally hierarchical (a blog post with embedded comments), or the team is more comfortable with JSON than SQL. Increasingly rare as a default in 2026 — Postgres JSONB has eaten most of its niche.
FeaturesBSON document model (binary JSON), aggregation pipeline ($match → $group → $sort), secondary indexes including compound and text, change streams, sharded clusters with config servers, transactions across documents (since v4.0).
PatternsEmbed related data (a user with their addresses) when always read together. Reference (foreign key style) when the related data is large or changes independently. Aggregation pipelines for analytics-style queries.
Trade-offsWas eventually consistent for years (now stronger by default). Hot documents have write conflicts. Index size grows fast. Operationally heavier than Postgres for the same workload in most cases.
Interview TipBe ready to defend why you'd pick Mongo over Postgres in 2026 — usually the answer is "I wouldn't, unless the team already has Mongo expertise or the document hierarchy is deep enough that Postgres JSONB queries get awkward". Naming it without justification is a junior signal.
The moment a user types into a search box that does anything fancier than WHERE name LIKE '%foo%', you need a real search engine. Same goes for log analytics, observability, and full-text scoring.
WhenFull-text search with relevance scoring, fuzzy/typo-tolerant queries, log aggregation and observability (the "E" in ELK stack), faceted search, geo queries. Anything where Postgres' ILIKE '%foo%' stops being good enough.
FeaturesInverted index (the magic that makes full-text fast), tokenization + stemming + stopword removal, fuzzy matching with edit distance, BM25 relevance scoring, geo_point and geo_shape queries, aggregations (terms, date histograms, percentiles), index aliases for zero-downtime reindex.
PatternsPair with Kafka for the indexing pipeline — app writes to source-of-truth DB, change-stream emits to Kafka, indexer consumes and updates Elasticsearch. Time-based indices for log data: one index per day (logs-2026-05-07) so you can drop old indices in O(1) and shard hot recent data more aggressively.
Trade-offsSchema rigidity via mappings (changing a field type requires a reindex). Eventually consistent — there's a 1-second refresh interval before new docs are searchable. Operationally heavy: shard sizing, JVM tuning, hot/warm/cold tier design. Memory hungry.
Interview TipSay "Elasticsearch is a derived index, not the source of truth — Postgres / Cassandra holds the canonical data, ES gets it via a CDC pipeline through Kafka." For logs: "indices sharded by time, hot tier on SSDs for last 7 days, warm tier on cheaper storage for 30 days, then deleted." Mention refresh_interval, force_merge, and index aliases for zero-downtime reindex.
WhenYou already have Postgres, and your search needs are real but modest — under 100M documents, basic relevance, no advanced fuzziness. Avoiding a second piece of infrastructure is worth a lot.
Featurestsvector column + GIN index for full-text. pg_trgm extension for trigram fuzzy matching. PostGIS + GiST for geo. JSONB + GIN for "find me docs where the JSON has key X = value Y".
PatternsAdd a generated tsvector column that concatenates searchable fields, GIN-index it, then WHERE search_vec @@ to_tsquery('foo & bar'). For typo tolerance, add a pg_trgm index and use % similarity.
Trade-offsRelevance scoring is weaker than Elasticsearch's BM25. Doesn't scale past one Postgres node. No native distributed search across shards.
Interview Tip"For under 100M docs and basic search, I'd start with a Postgres GIN index on a tsvector column — one fewer system to operate. I'd graduate to Elasticsearch when relevance tuning, fuzzy search, or sub-100ms p99 across hundreds of millions of docs become hard requirements."
The single biggest performance lever in any read-heavy system. A cache that holds the hot 20% of your data absorbs 80% of read traffic — your database tier shrinks 5x. Two players dominate this layer.
WhenLow-latency reads (sub-millisecond), distributed locks, pub/sub, leaderboards, session storage, rate limiting, real-time counters, geo-proximity queries. The default cache choice for ~95% of interview answers.
FeaturesEight rich data structures (see diagram below), single-threaded command execution (no race conditions inside one command), Redis Cluster for sharding, Sentinel for HA without sharding, RDB snapshots + AOF append-only log for persistence, Pub/Sub, Streams, Lua scripting for multi-command atomicity, RedLock algorithm for distributed locks.
PatternsCache-aside (read through app code), write-through (app writes to cache and DB synchronously), write-behind (cache async-flushes to DB). Eviction: allkeys-lru for general cache, volatile-ttl when only TTL'd keys should evict. Sharding via Redis Cluster (16384 hash slots, consistent-hash style).
Trade-offsRAM-bounded — a single node maxes out at the box's RAM. Persistence is best-effort (AOF every-second loses 1s on crash). Slower than Memcached for the simplest pure key-value workload due to richer features.
Interview TipNever just say "Redis". Say "Redis ZSET for top-K leaderboard", "Redis HASH keyed by user_id with 24h TTL for sessions", "Redis Cluster sharded across 6 nodes, each replicated 1× with Sentinel for failover", "RedLock for distributed lock with 10s timeout". Mention Redis Streams as a Kafka-lite when the use case is small.
Naming the data structure is the single biggest specificity win in an interview. The diagram below maps each Redis structure to the use case it owns.
| Structure | Common use | Hot commands |
|---|---|---|
STRING | Counters, JSON blobs, session tokens, simple K/V | SET · GET · INCR · SETEX |
HASH | User profiles, any record with multiple fields | HSET · HGET · HGETALL · HINCRBY |
LIST | FIFO queues, recent items, activity logs | LPUSH · RPOP · LRANGE |
SET | Uniqueness checks, tag membership, deduping | SADD · SISMEMBER · SINTER |
ZSET | Leaderboards, top-K, time-ordered feeds, priority queues | ZADD · ZRANGE · ZINCRBY |
STREAM | Event log, Kafka-lite with consumer groups | XADD · XREAD · XGROUP |
HyperLogLog | Cardinality estimation in fixed memory (12KB) | PFADD · PFCOUNT |
GEO | Proximity search, "nearby" queries | GEOADD · GEOSEARCH |
WhenYou need a pure dumb key-value cache, multi-threaded for max single-node throughput, and don't care about persistence or rich data types. Big at companies running it at massive scale (Facebook famously runs petabytes of Memcached).
FeaturesMulti-threaded (uses all CPU cores by default, where Redis is single-threaded), smaller memory footprint per key, slab allocator avoids fragmentation, simple text protocol.
PatternsClient-side consistent hashing (the client decides which node to hit, no cluster coordinator). LRU eviction is the only policy. Set a TTL on every key.
Trade-offsNo persistence — restart loses everything. No data structures beyond strings. No pub/sub, no scripting, no transactions. Cache is the entire feature set.
Interview Tip"I'd pick Memcached over Redis only when the only feature I need is the simplest possible K/V cache, and the multi-threaded throughput per node matters." For most systems, Redis is the safer default — more features, similar perf.
The async backbone of every modern system. Get the message broker right and your services decouple cleanly; get it wrong and a single slow consumer takes down a whole product.
WhenHigh-throughput pub/sub, event sourcing, replayable streams, change-data-capture pipelines, log aggregation, multi-consumer fan-out where each consumer wants its own copy of the stream.
FeaturesTopics partitioned for parallelism, consumer groups for load-balanced processing within a group + fan-out across groups, retention by time (7 days) or size (100GB), idempotent producers + transactional writes for exactly-once semantics, log compaction (keep only the latest value per key) for state-snapshot topics, replication factor (typically 3) across brokers.
PatternsPartition key = the thing whose ordering matters (user_id if events for one user must be processed in order). Consumer group per logical service — adding a consumer to the group rebalances partitions across consumers. Compacted topics for "current state" snapshots (user profile updates, config). Outbox pattern: app writes to DB + Kafka in a transaction via Debezium CDC.
Trade-offsOperationally complex — ZooKeeper or KRaft, broker tuning, partition rebalancing storms. Latency typically 5-50ms per message, not microseconds. Reordering across partitions (only ordered within a partition).
Interview TipSpecify all three: "Topic orders.events with 32 partitions keyed by user_id, 7-day retention, replication factor 3, consumer group fulfillment-svc with 16 workers". Mention compaction for state-snapshot topics, idempotent producer + transactional writes for exactly-once, and consumer offsets stored in Kafka itself.
WhenLower-volume task queues (under ~50K msg/sec), complex routing rules (exchanges/bindings), per-message delivery guarantees, dead-letter queues, priority queues. Classic background-job workloads.
FeaturesAMQP 0-9-1 protocol, exchanges (direct, topic, fanout, headers) + bindings for flexible routing, dead-letter exchanges, priority queues, message TTL, per-message ack/nack, mirrored queues for HA.
PatternsTopic exchange + routing keys for selective fan-out (orders.eu.priority matches binding orders.*.priority). DLQ on every queue for poison messages. Manual ack so failed processing requeues.
Trade-offsLower throughput than Kafka — 10K-50K msg/sec per node vs Kafka's millions. No replay (once consumed and acked, gone). Smart broker = more failure modes than Kafka's dumb broker.
Interview Tip"RabbitMQ for task queues with complex routing or low volume; Kafka when I need replay, multi-consumer fan-out, or throughput above 100K msg/sec."
WhenYou want a queue with zero ops. Decoupling microservices, fan-out to Lambda, simple async tasks. The right answer for ~80% of "I need a queue" situations on AWS / GCP.
FeaturesSQS: standard queue (at-least-once, best-effort ordering) and FIFO queue (strict ordering, exactly-once with dedup). Visibility timeout, dead-letter queues, long polling. Pub/Sub: pull or push subscriptions, exactly-once delivery, message ordering by key.
PatternsSQS + Lambda for serverless event processing. SNS → SQS fan-out (one event, multiple consumers). DLQ after N redelivery attempts.
Trade-offsLess control than self-hosted. No replay (once acked, gone). Per-message cost adds up at scale. AWS / GCP lock-in.
Interview Tip"SQS for simple async tasks where I want zero ops; Kafka when I need replayability, multi-consumer fan-out, or want events to be the source of truth."
WhenReal-time stream processing with stateful aggregations — windowed counts, sessionization, joining two streams, complex event processing. The "do something smart with a Kafka stream" engine.
FeaturesExactly-once state semantics via checkpoints, event-time processing with watermarks (correctly handles late/out-of-order events), tumbling / sliding / session windows, keyed state, state backends (RocksDB on disk for huge state).
PatternsRead from Kafka → windowed aggregation by key → write to a sink (Elasticsearch, Postgres, another Kafka topic). Keyed state for per-user counters. Watermarks tuned for your data's typical lateness.
Trade-offsHeavy infrastructure. For simple aggregations, Kafka Streams (a library, not a cluster) is enough. Operational complexity is real — checkpoint tuning, savepoints for upgrades.
Interview Tip"For simple per-event transforms, Kafka Streams (just a library on top of Kafka) is enough. Reach for Flink when you need stateful windowed aggregations across millions of keys with exactly-once guarantees."
Everything between a user's browser and your application servers. Routing the right request to the right backend, terminating TLS, caching at the edge — these tiers exist to take pressure off your app code.
WhenYou have multiple backend services and want one public surface in front of them. Adds cross-cutting concerns (auth, rate limiting, request transformation) without putting them in every service.
FeaturesPath-based and host-based routing, JWT/OAuth verification, per-key rate limiting, request/response transformation, observability (logging, tracing, metrics), API versioning, request validation.
PatternsOptions: AWS API Gateway (managed, Lambda-friendly), Kong (self-hosted, plugin-based), nginx (lightweight reverse proxy + Lua), Envoy (gRPC-native, used by Istio).
Trade-offsAdds a network hop. Hot-path latency-sensitive services sometimes skip the gateway and go straight to the LB.
Interview TipDon't deep-dive — abstract it. "API Gateway terminates TLS, validates the JWT, applies the per-tenant rate limit, and routes to the right backend service." That's the level of detail expected.
WhenAlways — every multi-instance service needs an LB in front of it. The interesting choice is L4 (transport-level, TCP/UDP) vs L7 (application-level, HTTP).
FeaturesL4: persistent connections like WebSockets and gRPC streams, lower latency (no HTTP parse), source-IP hash for sticky sessions. L7: path-based routing (/api/* → svc-A, /static/* → CDN), header-based decisions (X-Customer-Tier), TLS termination, request rewriting, weighted canary deploys.
PatternsAlgorithms: round robin (default), least connections (when request times vary), consistent hash on IP or session-ID (when stickiness matters). Health checks every 5s, evict unhealthy backends. Active-passive HA pair via virtual IP failover.
Trade-offsL7 has more features but slightly higher latency (a few hundred microseconds). L4 is simpler and faster but can't make HTTP-aware routing decisions.
Interview Tip"WebSockets, gRPC streams, raw TCP → L4 (NLB, HAProxy in TCP mode). HTTP REST APIs with path-based routing → L7 (ALB, nginx, Envoy). I'd put an L7 in front of the app tier and an L4 in front of any persistent-connection service."
WhenYou serve static assets (images, JS, CSS, video segments) or cacheable API responses to a globally distributed user base. Latency-sensitive reads from far-away users.
FeaturesEdge POPs in 200+ cities, TTL-based caching with explicit purge, signed URLs for private content, image optimization (resize/convert at edge), WAF integration, DDoS absorption, gzip/brotli compression. Options: CloudFront, Akamai, Cloudflare, Fastly.
PatternsStatic assets get long TTLs (1 year) with versioned filenames (app.v123.js) so cache-busting is automatic. API responses get short TTLs (60s for "trending feeds") with Cache-Control headers. Signed URLs for time-bounded private downloads (presigned S3-style).
Trade-offsCache invalidation is expensive — purges propagate over minutes. TTL math: longer TTL = better hit rate but staler data. Cost can spike on miss-heavy workloads.
Interview Tip"CDNs cache JSON API responses too, not just images — for a news site's /api/trending endpoint, a 60-second TTL at the CDN edge takes 99% of read traffic off the origin." Mention signed URLs for private content and versioned filenames for cache busting.
WhenSits between LB and app servers. Often combined with API Gateway. Used for TLS termination, request buffering, static file serving, header manipulation.
Featuresnginx: workhorse, high throughput, Lua scripting. HAProxy: pure load balancing, fastest L4. Envoy: cloud-native, hot-reload config, deep observability, gRPC-native.
Patternsnginx as the public-facing TLS terminator + L7 router. HAProxy when you need pure L4 throughput. Envoy as the sidecar in service mesh deployments (Istio, Linkerd).
Trade-offsYet another tier to operate. nginx config can become a tangled mess; Envoy's YAML is verbose but well-defined.
Interview TipJust name one: "nginx terminates TLS and reverse-proxies to the app pods." Don't deep-dive unless asked.
Where the bytes that don't belong in a database live. Photos, videos, model checkpoints, log archives — anything large, unstructured, and rarely searched.
WhenLarge unstructured files: images, videos, PDFs, ML model artifacts, log archives, backups. Anything >1MB that's accessed by full key. The default for "where do I store the file?" on AWS / GCP / Azure.
Features11 nines of durability (lose a file once every 100 million years), 4 nines of availability, presigned URLs (time-bounded direct upload/download tokens), multipart upload (resumable for large files), lifecycle policies (auto-tier to Glacier after 30 days), event notifications (S3 → Lambda / SQS on object creation), versioning, cross-region replication.
PatternsPresigned URLs — the single most important pattern. Client requests an upload URL from your API, your API returns a presigned S3 URL, client uploads bytes directly to S3. Your app servers never touch the bytes. Same for download.
Trade-offsNever use as primary DB — eventual consistency for some operations, no joins, no transactions. Per-request pricing matters at huge scale. List operations are slow.
Interview Tip"Use presigned URLs to keep large blob bytes off your app servers — the upload goes browser → S3 directly, app server only signs the URL." Mention lifecycle policies for cost optimization (Standard → IA after 30 days → Glacier after 90 days), event notifications for derived processing pipelines (image resize, video transcode).
Cost$0.023/GB/mo for S3 Standard, $0.004/GB/mo for S3 Glacier. Egress dominates at scale — putting CloudFront in front cuts egress cost by ~50%.
WhenBig-data batch analytics with Hadoop / Spark, particularly on-prem. Increasingly rare for greenfield systems — most new deployments use S3 + Spark / Trino instead.
FeaturesBlock-based storage (default 128MB blocks) replicated 3× across nodes, NameNode for metadata + DataNodes for blocks, designed for sequential scans of huge files.
Trade-offsNameNode is a SPOF (until HDFS HA). Operationally heavy. Declining vs S3 for new systems — S3 + EMR / Spark gives you the same compute model without operating a filesystem cluster.
Interview Tip"For new systems on cloud, I'd default to S3 + Spark/Trino over HDFS. HDFS still makes sense for on-prem big-data lakes already running Hadoop."
The boring infrastructure that keeps distributed systems honest. Leader election, distributed config, service discovery, distributed locks at strong consistency.
WhenDistributed configuration, leader election, distributed locks where correctness matters more than latency, service discovery. The classic answer for "we need a coordinator" — Kafka used it for years (now KRaft), HBase still does.
FeaturesZAB consensus protocol (similar to Raft), strong consistency for writes, ephemeral nodes (auto-delete on session end — perfect for "is this node alive?"), watches (notify on znode change), sequential nodes for fair locking.
PatternsLeader election: each candidate creates an ephemeral sequential znode under /election; lowest sequence wins. Distributed lock: same pattern, lowest sequence holds the lock. Service discovery: services register ephemeral znodes under /services/{name}.
Trade-offsHeavy ops — needs an odd-number ensemble (3 or 5 nodes), JVM tuning, snapshot management. Reach for it only when Redis Redlock isn't strong enough.
Interview Tip"For short-duration locks where some weakness is OK, Redis Redlock is fine. For leader election or locks where correctness is sacred — only one process can run the migration job — ZooKeeper or etcd."
WhenSame use cases as ZooKeeper but with a simpler API and modern operational story. Used as the source of truth for Kubernetes cluster state.
FeaturesRaft consensus, gRPC API, watch streams, lease-based ephemeral keys, transactions over multiple keys, MVCC for historical reads.
PatternsSame as ZooKeeper — leader election, distributed config, service registry. K8s controllers watch etcd for desired-state changes.
Trade-offsStill requires an odd-number cluster (3 or 5 nodes). Disk I/O sensitive — slow disk = slow consensus = cluster instability.
Interview Tip"For greenfield coordination needs, I'd pick etcd over ZooKeeper — simpler API, better tooling, and if I'm already on Kubernetes I can leverage its etcd indirectly via the K8s API."
Metrics, monitoring, IoT sensor data — workloads where every row is (timestamp, key, value) and queries are dominated by "give me X over the last Y time bucketed by Z".
WhenApplication metrics, infrastructure monitoring, IoT sensor telemetry, financial tick data. Anything where the X-axis is time and aggregations / downsampling matter.
FeaturesTime-bucketed compression (10-100× smaller than row storage), automatic downsampling (keep 1s resolution for 1 hour, 1m for 1 day, 1h for 1 year), retention policies, time-bucket-aware query languages (Flux, PromQL, SQL hypertables in TimescaleDB).
PatternsPrometheus = pull-based, owned by the metrics ecosystem (Kubernetes, Grafana). InfluxDB = push-based, popular for IoT. TimescaleDB = Postgres extension, best when you also need SQL joins to non-time-series data.
Trade-offsEach is opinionated — Prometheus is metrics-only (no logs/traces), InfluxDB has had storage-engine churn, TimescaleDB inherits Postgres' vertical-scaling ceiling.
Interview Tip"Don't store metrics in Postgres at scale — a 1B-row metrics table is painful, and you give up 10-100× compression. Reach for a time-series DB the moment your metrics workload is real. Prometheus for K8s-native infra, TimescaleDB if you also need SQL joins, InfluxDB for IoT."
The category that exploded with the LLM era. If your system has anything to do with semantic search, retrieval-augmented generation (RAG), recommendations from embeddings, or "find me documents similar to this one" — you need a vector DB.
WhenSemantic search, RAG (retrieval-augmented generation for LLMs), recommendation embeddings, image / audio similarity, anomaly detection. Anything where "give me the K nearest neighbors to this 1536-dim vector" is the core query.
FeaturesApproximate nearest neighbor (ANN) search via HNSW (graph-based, fastest) or IVF (inverted file, memory-efficient) indexes, hybrid queries combining vector similarity + keyword filters (category=tech AND vector ~ X), metadata filtering, vector dimension up to 2K+, cosine / dot product / L2 similarity.
PatternsPinecone = managed, easy. Weaviate = self-hosted, hybrid search built-in. Milvus = open-source at scale. pgvector = Postgres extension, perfect when you don't want a separate system and have under ~10M vectors.
Trade-offsANN trades recall for speed — tune the trade-off via the index parameters (HNSW's ef, IVF's nprobe). Index build is expensive on huge corpora. Recall typically 0.95-0.99, not 1.0.
Interview Tip"For RAG: chunk documents → embed each chunk via an embedding model → store (vector, chunk_text, source_doc_id) in a vector DB → at query time, embed the user's question and find the top-K nearest chunks → feed those chunks as context to the LLM." For under 10M vectors with existing Postgres, pgvector avoids a new system; for >100M, Pinecone or Milvus.
"Only one worker should run the cleanup job." "Only one process should hold the lease on this resource." "Two API requests must not double-charge the customer." These are distributed lock problems. Two main answers, with very different correctness guarantees.
WhenShort-duration locks (seconds to minutes) where occasional weakness is acceptable. Rate limiting, idempotency keys, "don't let two pods process the same job", soft mutual exclusion.
FeaturesSET key value NX EX 10 — atomic "set if not exists" with TTL. The TTL is the safety net: even if the holder dies without releasing, the lock auto-expires. Redlock algorithm extends this to multiple Redis instances for higher availability.
PatternsAcquire: SET lock:{resource} {random_id} NX EX 10. Release: Lua script that checks the value matches before deleting (so you don't release someone else's lock). Renewal: extend TTL while still holding work.
Trade-offsFamous Martin Kleppmann critique: Redlock is not safe under all failure models. Clock skew + GC pauses can cause two holders to think they own the lock simultaneously. Acceptable for "shouldn't" but not "must not".
Interview Tip"For short locks where occasional double-execution is recoverable (idempotent work, rate limiting), Redis Redlock with TTL is the right call — fast and simple. For locks where two holders would corrupt data, ZooKeeper or etcd."
WhenCritical sections where two simultaneous holders would corrupt data. Leader election for a cluster, exclusive job execution (only one node runs the migration), config rollouts.
FeaturesEphemeral sequential znodes (ZK) or leases (etcd). Built on top of consensus (ZAB / Raft) so the lock is provably held by exactly one client at a time, modulo network partitions which fail closed.
PatternsZK leader election: each candidate creates an ephemeral sequential znode; the lowest sequence number wins. When the leader's session ends (crash, network drop), its znode disappears, the next-lowest takes over.
Trade-offsSlower than Redis (consensus round-trips). Operationally heavy. Overkill for soft locks.
Interview Tip"Two holders holding the lock simultaneously would corrupt data → ZK/etcd. Two holders is just inefficient → Redis."
The exact phrases. Read these out loud until they feel natural — then it'll come out automatically in an interview.
video:{id} Redis HASHes in a 6-node Redis Cluster, sharded on video_id via the cluster's hash slots, LRU eviction at 64GB per node, 30-min TTL, with explicit invalidation on the transcode-complete Kafka event."
notifications.email with 32 partitions keyed by user_id for ordering per user, 7-day retention, replication factor 3 across 3 AZs, with consumer group email-worker of 50 pods. Failed messages route to a DLQ topic after 3 retries."
PK = user_id and SK = created_at DESC for the GetUserPosts access pattern (one query, sorted, no scan), with a GSI on post_type for the BrowseByCategory pattern. Provisioned at 5K WCU / 25K RCU with on-demand for spikes."
/api/* to the API service group, /ws/* to a separate NLB for WebSocket connections (L4 because L7 doesn't handle persistent upgrades cleanly). Health checks every 5s on /healthz, evict after 2 failures."
logs-2026-05-07), 6 primary shards each, 1 replica, hot tier on SSDs for 7 days, warm on cheaper storage for 30 days, then deleted via ILM."
The handful of decisions that come up in almost every interview. When the interviewer asks "why this and not that?", these trees are the structured answer.
The technologies above don't live alone — they're combined into stacks that have proven themselves at scale. Here's how the canonical HLD problems compose them.
| System | Source of truth | Cache | Search / Index | Async / Stream | Storage / Edge |
|---|---|---|---|---|---|
| URL Shortener | Cassandra(hash → URL) |
Memcachedhot 20% |
— | Kafkaclick telemetry |
CDNedge-cache 302s |
Cassandratweets |
Redisuser timelines (LISTs) |
Elasticsearchtweet search |
Kafkafan-out service |
S3 + CDNmedia |
|
| YouTube | Cassandravideo metadata |
Memcachedtrending feed |
Elasticsearchvideo search |
Kafkatranscode pipeline |
S3 + CDNvideo segments |
| Uber | Postgresrides, payments |
Redis GEO + in-mem QuadTreedriver locations |
Elasticsearchtrip history |
Kafkalocation updates |
— |
Postgresusers, posts metadata |
Memcachedfeed timelines |
Elasticsearchhashtag search |
Kafkanotifications |
S3 + CDNphotos |
|
| Dropbox | MySQLfile metadata |
Memcachedfolder listings |
— | Kafkasync notifications |
Block Store on S3file chunks |
| Slack / Chat | Cassandra or Postgresmessages |
Redispresence, channel members |
Elasticsearchmessage search |
Kafkadelivery fan-out |
WebSocket via L4 LB |
| Netflix-style streaming | Cassandrauser profile, watch state |
EVCache (Memcached fork)hot metadata |
Elasticsearchtitle search |
Kafkaevents, recs pipeline |
S3 + custom CDNvideo segments |