2 billion users, 100B messages a day, and servers that physically cannot read a single one — the architecture that turns the Signal Protocol into a global product
This deep-dive applies the 4-step HLD interview framework. As you read, map each section to Requirements → Entities → APIs → High-Level Design → Deep Dives, and notice which of the 8 common patterns and key technologies are at play.
It's a Tuesday afternoon. Sarah is in New York, scrolling through photos from her trip, and she taps a picture of the Brooklyn Bridge to send to her mom in Mumbai. The instant her thumb leaves the screen, something quietly remarkable happens: that photo is wrapped — bytes scrambled into ciphertext using a key that lives only on her mom's phone — handed to a WhatsApp server that physically cannot read it, relayed across continents, and 600 milliseconds later her mom's phone unwraps it back into a picture. Nobody in between, including WhatsApp itself, ever sees the photo. That trick — running a global messenger that handles 100 billion messages a day while being mathematically blind to all of them — is what we're designing.
WhatsApp is a phone-number-identified, end-to-end encrypted messaging app: 2 billion users, 1.5 billion daily actives, around 100 billion messages per day, plus voice/video calls, status broadcasts, and now multi-device sync (your phone, web, and tablet all stay in sync without trusting the server). Think of the server as a sealed-envelope post office — it knows the from-address and the to-address, but the envelope itself is locked, and only the recipient has the key.
WhatsApp looks deceptively simple — a chat app — but every requirement below is a constraint that bends the architecture in ways an unencrypted messenger never has to deal with.
+E.164 numbers verified via SMS OTPThe numbers below dictate sharding, fanout, and the choice of a persistent connection protocol over plain HTTP. Without them you cannot justify why we run 50K open sockets per server or why media has to live in S3 not the message store.
Public stats: 2 billion users, 1.5B DAU, around 100 billion messages per day.
~100B
Roughly 65 msgs per DAU per day
~1.16M/s
100B / 86400
~5M/s
3-5x avg during prime-time across regions
~500M+
1.5B DAU × ~33% online at any moment
Average text message is ~100 bytes (after encryption overhead). About 10% of messages carry media — average 1MB per media message but media flows through S3, not the messaging path.
~116 MB/s
1.16M msgs/s × 100 bytes
~116 GB/s
10% of msgs × 1MB → S3 directly
~2.5×
Multi-device + groups multiply egress
WhatsApp's storage model is unusual: messages are deleted from the server once delivered to all recipient devices. Only undelivered messages are stored, plus optional encrypted backups in iCloud / Google Drive (managed outside WhatsApp's infrastructure).
| Metric | Value | Why it matters |
|---|---|---|
| Concurrent sockets | 500M+ | Drives Connection Server count and per-server tuning (50K-1M sockets each) |
| Peak msgs/sec | 5M/s | Drives sharding and Routing Service throughput |
| Media throughput | 116 GB/s | Forces S3-class blob storage; cannot live in the messaging path |
| Pending msg storage | ~5-10 TB | Only undelivered queue — small because messages are deleted post-delivery |
| User & device registry | ~500 GB | 2B users × ~250 bytes (phone, name, public keys per device) |
WhatsApp's "API" is mostly a binary protocol over a persistent socket, not REST. But the conceptual operations are easy to enumerate. The signatures below are pseudocode — what each call does, who calls it, and what flows over the wire.
Conceptual API surface// 1. REGISTER — phone-number identity, verified via SMS OTP register(phone_number, device_metadata) → { user_id, otp_token } verify_otp(otp_token, sms_code, device_public_key) → { auth_token, user_id, device_id } // 2. GET KEYS — Signal Protocol pre-key fetch (X3DH) // Sender calls this for each recipient device before sending the first message get_keys(recipient_user_id) → [ { device_id, identity_key, signed_pre_key, one_time_pre_key } , ... ] // 3. SEND MESSAGE — encrypted payload per recipient device // Sender encrypts ONCE PER DEVICE in the recipient set send_message({ conversation_id, recipient_devices: [ { user_id, device_id, encrypted_payload }, { user_id, device_id, encrypted_payload }, ... ], timestamp }) → { message_id, server_timestamp } // 4. UPLOAD MEDIA — E2E-encrypted blob, key shipped inside the message upload_media(content_type, encrypted_bytes) → { s3_url, content_hash } // 5. SUBSCRIBE PRESENCE — last-seen / online indicators subscribe_presence(user_id) ← { user_id, online: bool, last_seen: timestamp } // 6. PUSH RECEIPT — message delivered / read ack_receipt(message_id, type: "delivered" | "read")
get_message_history — the server doesn't keep history, the device does. There's no search_messages — the server cannot, only the device can search its own decrypted archive. There's no get_group_members_chat — group history lives only on members' devices. The API surface is starkly minimal because the server simply does not have most of the data a non-encrypted messenger would expose.get_keys call? Before Sarah can send her first message to her mom, her phone needs Mom's identity key, signed pre-key, and a one-time pre-key to run the X3DH handshake (Signal's initial key agreement). Mom may be offline at that moment — that's fine, her keys were uploaded to the Key Distribution Service ⑥ when she registered. This is the trick that lets Signal work asynchronously between people who have never connected at the same time.WhatsApp's data model has a quirky shape: the user/device registry is small and relational (good for PostgreSQL), the pending-message queues are huge but ephemeral (good for a Cassandra / HBase / Mnesia-style store), and media lives outside the DB entirely (S3 / blob storage). Each datastore is picked for the access pattern, not for uniformity.
Small, relational, latency-sensitive. ~500GB total. Used to look up "what devices does +1-555-... have, and what are their public identity keys?" Sharded by phone number; each shard replicated across regions.
Per-device offline message queue. Wide-column, cheap appends, sharded by device_id. Rows are deleted once the recipient's connection has acked delivery — rarely grows, even at 100B msgs/day.
Photos, videos, voice notes, files. Stored as encrypted bytes — the AES key lives only inside the per-message E2E payload, never on the server. WhatsApp can host petabytes of media it cannot decrypt.
This is the section that decides whether you understand WhatsApp or just kind of know what it is. We build the system in three passes: a naive central server that a textbook would draw, the failure modes that force the production split, and the final shape where every component earns its keep.
A textbook chat server: clients open an HTTP connection, POST messages to a central server, the server stores them in a database and forwards them to the recipient. Just like SMS, but on the internet.
Walk this through with two billion users and four concrete failures fall out:
This is the privacy failure, and it's the one that disqualifies the design entirely. Every photo, every voice note, every message between a journalist and a source — visible to whoever runs the server, plus anyone who breaches it, plus any government that subpoenas it. WhatsApp's central promise breaks before you even get to scale.
Mom's phone is on a 2G network in Mumbai and her battery died at 3am. Sarah's message hits the server, the forward to Mom fails, and the textbook design just... drops it. We need a durable per-device offline queue that holds the message until Mom's phone comes back online, sometimes 12 hours later.
A single beefy server handles maybe 50K open WebSocket connections and a few hundred thousand msgs/sec at best. We need 500M+ concurrent connections and 5M peak msgs/sec. That's a horizontal-scaling problem — but if we just add more chat servers, how does Sarah's server know which server holds Mom's connection right now?
Mom uses WhatsApp Web at her desk and the app on her phone. The naive design has one "Mom" — when Sarah sends, it goes to "Mom's connection." But Mom has two connections, both wanting the message, and each runs its own private key. The server cannot just clone the ciphertext to both — they were encrypted to different keys.
The single insight that makes WhatsApp possible is this: encryption happens on the devices, not on the server. Each device has its own key pair. The server is reduced to a router of opaque ciphertext. When a user has multiple devices, the message is encrypted multiple times — once per recipient device — before it leaves the sender. The server fans out the pre-encrypted ciphertexts to the right device queues.
Where the actual privacy happens. Each device runs the Signal Protocol — X3DH for the initial handshake, Double Ratchet for per-message forward secrecy. To send to Mom (who has 3 devices), Sarah's phone produces 3 ciphertexts, one per device. The server never sees any plaintext. Even if the server is fully compromised, past messages stay secret.
Where the scaling happens. A pool of Connection Servers each holds tens of thousands of persistent sockets. A Routing Service maps (user_id, device_id) → which connection server holds them right now. To deliver, you look up the route, push the ciphertext blob to that connection server, and let it write to the device's socket. Server has zero visibility into the payload.
What travels where: plaintext lives only on devices. Ciphertext travels both directions over persistent sockets. The Routing Service has metadata (who is connected where) but never payload. Media bytes travel as a separate flow to S3, also pre-encrypted by the sender. This split lets the encryption story stay airtight while the messaging fabric scales like a normal pub-sub system.
The full architecture, with every node numbered. Match each circled digit in the diagram to a card below to see what the component does, why it exists, and what would break without it.
Each card answers the four newbie questions: what is it, why does it exist, what would break without it, and where does it sit in the flow.
The client is where the entire encryption story lives. It runs the Signal Protocol library, holds the device's private identity key (which never leaves the device), maintains a chat database on local disk, and keeps a persistent socket open to a Connection Server. Sarah's phone, her laptop, and her tablet are three independent clients, each with its own key pair.
Solves: the privacy guarantee. Because the server holds no private key, even a full server breach cannot decrypt past messages. Every architectural decision elsewhere assumes the heavy work of encryption, decryption, and indexing happens here.
A layer-4 (TCP) load balancer that terminates TLS and forwards the persistent socket to a Connection Server. L4 not L7 because the traffic is a custom binary protocol, not HTTP — there are no URLs to inspect, just raw frames. Health-checks every few seconds and pulls dead Connection Servers from rotation.
Solves: the entry point at scale. Without the LB, clients would have to know exact Connection Server IPs — meaning you couldn't add or rotate servers without updating every device. The LB also gives you a stable DNS name like g.whatsapp.net behind which the server fleet can change freely.
Each Connection Server holds a small army of persistent sockets — historically tens of thousands per box, and on FreeBSD-tuned WhatsApp Erlang servers, famously up to 2 million on one host. When Sarah's phone is online, exactly one socket on exactly one Connection Server is hers. Inbound messages from her are read off that socket; outbound messages to her are written to that socket.
Solves: the "500M concurrent users" problem. HTTP request-response cannot scale to that many open conversations because every request pays a TLS-handshake tax. Persistent multiplexed sockets pay that tax once per device-week, then send millions of frames over the same socket. This is also where presence ("online") naturally lives — a socket open means online; closed means last-seen.
What if a Connection Server dies? The 50K-1M clients on it reconnect (their phones detect the socket drop within seconds), the Load Balancer routes them to a healthy server, and the Routing Service updates their (user_id, device_id) → conn_server mapping. A few seconds of perceived latency on those devices, no message loss because pending messages are stored durably elsewhere.
A high-throughput key-value service whose only job is to answer "which Connection Server is currently holding the socket for (user_id, device_id)?" Backed by Redis or an in-memory distributed store like Mnesia, partitioned by user_id. Updated whenever a client connects or disconnects.
Solves: message delivery across a horizontally scaled fleet. When Sarah's Connection Server in São Paulo needs to push her message to Mom in Mumbai, it does not broadcast to all servers — it asks Routing Service "where is Mom's phone right now?" and gets back conn-mum-42. Without this lookup, you would either fanout to every server (impossibly wasteful) or pin all of a user's traffic to a fixed server (no fault tolerance).
Phone-number registration. When you install WhatsApp, this service sends an SMS OTP to your phone, you type it back, and the service issues an auth_token tied to (phone_number, device_id). Phone numbers are the user-facing identity — there are no usernames. Stores the +E.164 phone, a stable opaque user_id, and the device list.
Solves: bootstrapping identity. Without it, anyone could register as any phone number, defeating contact-discovery (the killer feature: "all your phone contacts who are on WhatsApp appear automatically"). The OTP gates registration to the actual SIM holder.
The Signal Protocol's X3DH handshake requires the sender to fetch three things about the recipient device: an identity public key (long-lived), a signed pre-key (rotated weekly), and a one-time pre-key (consumed per first-message). Devices upload batches of one-time pre-keys when they register and refill the pool periodically. The Key Distribution Service stores and serves these.
Solves: asynchronous handshakes. Sarah may want to send Mom a message while Mom's phone is off — Signal still needs Mom's keys to derive a session. Pre-keys uploaded ahead of time make this work. Without the pre-key system, you could not start a Signal session with someone who is not currently online — Signal would devolve into a real-time-only protocol.
A per-device offline queue. When Sarah sends to Mom (who has 3 devices, all currently offline), the server writes 3 rows: (mom-phone-device, ciphertext-A), (mom-laptop-device, ciphertext-B), (mom-tablet-device, ciphertext-C). Each row sits there until that device reconnects and acks delivery, at which point the row is deleted. Backed by Cassandra/HBase, sharded by device_id.
Solves: "recipient is offline." Without a per-device queue, Sarah's message would be dropped any time Mom's phone was off. With it, Mom can be on a flight for 8 hours and still receive every message in order when she lands.
Photos, videos, voice notes, and files don't travel through the messaging path — they would clog it. Instead, Sarah's phone encrypts the photo with a fresh random AES key, uploads the ciphertext to S3 via a presigned URL handed out by the Media Service, and gets back a blob URL. She then sends Mom a regular text message containing { s3_url, aes_key } — and that message is itself end-to-end encrypted with Mom's device key. Mom's phone fetches the blob from S3 and decrypts it locally.
Solves: media at scale, while preserving E2E. The server sees encrypted bytes in S3 and an encrypted little message in the queue — never the photo. Without this split, every 1MB photo would have to flow through the Connection Server fabric, multiplying messaging bandwidth by 100×.
Group fanout. When Sarah sends to a 50-member group, her client encrypts once per recipient device using the group's sender key (an additional Signal construct), and the Group Chat Service knows the group's device list and writes one Pending Message row per recipient device. New members get re-keyed; departed members are cut off via a sender-key rotation.
Solves: efficient group encryption. Naive approach (encrypt N times for N members) works up to ~10 members but bogs down at 256+. Sender keys let Sarah encrypt the message body once with a symmetric key and only encrypt that key per recipient device — far cheaper. Without the Group Chat Service, every 1024-member group message would spawn a thousand-way client-side fanout that the sender's phone could not afford.
Tracks "online" and "last-seen" indicators. Subscribes to socket connect/disconnect events from Connection Servers and pushes updates to subscribed clients ("Mom went online at 14:03"). Stored in a fast in-memory cache because presence is ephemeral and high-churn.
Solves: the small UX touches that make a chat app feel alive — the green dot, the typing indicator, "last seen 2m ago." Without it, you would not know whether your message was being read in the moment. Importantly, presence is opt-in and does not leak message content — it is a bit of metadata, not plaintext.
WebRTC for the actual media (audio/video frames travel peer-to-peer when both endpoints can punch through NAT). Call setup signaling — "Sarah is calling, here are her ICE candidates" — flows through the WhatsApp servers, but is itself end-to-end encrypted. When direct P2P fails (strict corporate NATs, mobile carriers), traffic falls back to TURN relay servers run by WhatsApp, which still see only encrypted media.
Solves: low-latency real-time calls without compromising encryption. Without WebRTC + TURN, calls would have to flow through an MCU (Multipoint Control Unit) that mixes audio — which would require decryption on the server. Peer-to-peer media keeps the encryption end-to-end.
If Mom's phone has its WhatsApp socket closed (app backgrounded, OS suspended it to save battery), the server cannot push messages directly — the OS owns the radio. Instead, the Pending Message Store fires a notification through APNS (iOS) or FCM (Android). The notification wakes the app, the app reconnects its socket, and pulls the queued ciphertexts. The notification itself contains no message content — just a "you have messages" wake-up signal.
Solves: mobile reality. Phones aggressively suspend background apps; without OS-level push, your friends could message you all day and your phone would not buzz until you opened the app. APNS/FCM is the only legal way to bypass this. The cost: WhatsApp depends on Apple and Google's infrastructure for the wake-up step, but never trusts them with content.
Two flows. First, a 1-on-1 photo send to Mom (single device, simple case). Second, the same photo to a 50-person group chat. Both reference the numbered components above.
s3://wa-media/abc123.bin.{identity_key, signed_pre_key, one_time_pre_key} for Mom's phone device.{ s3_url, aes_key, "from Brooklyn ❤️" } with Mom's session key. Result: ~200 bytes of ciphertext.conn-mum-42. But Mom's phone is currently offline (asleep), so the entry says "no socket."device_id, and triggers Push Notification Service ⑫ → APNS wakes Mom's iPhone with "1 new message."conn-mum-42. Connection Server reads the pending row, writes the ciphertext to Mom's socket.{s3_url, aes_key, text}. Phone fetches the encrypted blob from S3, decrypts with the AES key, displays the photo. Total elapsed ~600ms from Mom's phone waking.WhatsApp's encryption isn't bespoke — it's the open Signal Protocol, designed by Open Whisper Systems / Moxie Marlinspike and audited by the academic crypto community. The protocol does two jobs: set up a session when two devices first talk (X3DH), and encrypt every message with a fresh per-message key that gets thrown away after use (Double Ratchet).
Imagine Sarah wants to start a chat with her friend Raj, who is offline. In a typical TLS-style handshake, both parties must be online at the same time to exchange ephemeral keys. Signal can't require that — Raj might be on a plane. X3DH (Extended Triple Diffie-Hellman) solves this by having Raj pre-upload a bundle of public keys to the Key Distribution Service ⑥ when he registers his device:
Long-lived. Signs the pre-key. This is the public key Sarah sees on the "verify Raj" QR code.
Rotated weekly. Mid-term key, signed by the identity key so Sarah knows it's authentic.
Pool of 100+. Each used at most once. The one-time-ness gives forward secrecy for the very first message.
When Sarah wants to message Raj, her phone fetches Raj's bundle and runs three Diffie-Hellman computations whose results are mixed together to derive a shared session key. Crucially, Raj never had to be online — his pre-keys were sitting on the server waiting. As soon as Raj's phone wakes, it pulls Sarah's first message + her ephemeral key, runs the matching DH on its end, and arrives at the same session key.
X3DH gives you a session key, but if that single key encrypted every message, a future compromise of either device would expose the whole chat history. Double Ratchet rotates the encryption key on every message. The "double" refers to two ratchets that mix together:
After each message, derive the next message key from a hash of the current one and immediately destroy the previous. Even if today's message key leaks, yesterday's messages cannot be re-derived from it.
Every time a new message arrives in the other direction, a fresh DH exchange piggybacks on it, generating a new "chain key." This bounds how long a compromise can listen — at most until the next time the conversation goes the other way.
| Server CAN see | Server CANNOT see |
|---|---|
| Sender phone number | Message text or media bytes |
| Recipient phone number(s) | Whether two messages have the same content |
| Timestamp of send | Whether a media file is the same as a previously sent one |
| Message size (in ciphertext) | Group chat names, member descriptions |
| That a call happened (signaling) | The audio/video of the call |
For years, WhatsApp had a strict 1-user-1-device model: WhatsApp Web was a "mirror" of the phone, and if the phone was off the laptop did not work. In 2021 they shipped multi-device, and the architecture had to bend in interesting ways to keep encryption end-to-end.
The simplest thing would be: when Sarah pairs her laptop, copy her phone's private identity key over to the laptop. Now the laptop "is" the phone. But this weakens the security model — the key now exists in two places, and any compromise of either device exposes the other. It also makes the key transit (over the network) a juicy target.
When Sarah pairs her laptop, the laptop generates its own identity key locally. Pairing only proves to the WhatsApp servers (and to her contacts) that "this new device is also Sarah." The laptop is treated as a peer device of Sarah's account, not a clone. From the protocol's view, Sarah is multiple devices.
When Raj sends Sarah a message, his phone fetches the device list for Sarah from the Key Distribution Service ⑥ — say, 4 devices: phone, laptop, tablet, desktop. Raj's phone runs Double Ratchet for each Sarah-device and produces 4 ciphertexts. The server fans out 4 rows, one per device queue. Each Sarah-device decrypts independently using its own private key.
Sarah sends a message from her phone. Her laptop also needs to show "you sent: hi mom" in the chat thread. So when Sarah's phone sends to Mom (4 devices), it also encrypts the message to its own peer devices — laptop, tablet, desktop. That's 4 ciphertexts to Mom plus 3 ciphertexts to Sarah's other devices = 7 ciphertexts for one logical "send." The server fans them all out the same way.
No. A laptop paired today does not gain Sarah's phone's history. The phone can optionally re-send some recent messages to the laptop (encrypted to the laptop's new key), and that's how WhatsApp Web shows recent threads after pairing — but nothing forces the phone to do this, and old conversations remain phone-only unless explicitly synced. This is the privacy floor: a stolen-then-paired device cannot retroactively read what came before its pairing.
When Sarah removes her stolen laptop from her account, the Identity Service ⑤ marks that device deleted, the Key Distribution Service ⑥ stops returning its pre-keys, and contacts' phones refresh their device lists on next send. The laptop's keys remain valid for already-sent-but-undelivered messages, but no new messages will be encrypted to it.
Encrypting a message for a 256-person group naively means encrypting it 256 times — once per device session. For small groups that's fine. For 1024-member groups (with multi-device, that's 1500+ recipient devices) it's a serious CPU + bandwidth burden on the sender's phone. Signal's Sender Keys construct fixes this.
For each group, every member's device generates a sender chain key: a symmetric key used to encrypt outgoing messages from that device to that group. The first time Sarah sends to "Family Group," her phone:
The sender chain key itself ratchets per-message (forward secrecy preserved). Other members who want to send to the group each have their own sender chain key, distributed once.
The new member's device gets the current sender keys from each existing member, via pairwise Signal sessions. They can decrypt messages from this point forward — never older messages.
All remaining members rotate their sender keys so the departed member can no longer decrypt new messages. This is one of the more expensive operations in WhatsApp — N members each redistributing a new sender key to N-1 others.
Group Chat Service ⑨ stores only the device list. Server fans out the (already encrypted) message to every member device's Pending Message Store ⑦ row. It cannot read the body — just like 1-on-1.
Calls are a fundamentally different beast from text — they need real-time, low-latency, peer-to-peer media at 100s of kbps continuously. They use WebRTC, a web-standard P2P media stack, with WhatsApp servers handling only signaling and NAT-traversal fallback.
When Sarah taps "video call" on Mom's chat:
Some networks (corporate firewalls, certain mobile carriers, symmetric NATs) block P2P UDP. In that case, both phones connect to a WhatsApp TURN relay server and forward their media through it. The TURN server is a dumb packet pump — it sees only the encrypted SRTP streams, not the audio. So even relayed calls remain end-to-end encrypted; the relay just shuffles bytes.
For 1-on-1, P2P is enough. For group calls (8+ participants), full mesh P2P would mean each phone uploads its video stream N-1 times — kills mobile data plans. WhatsApp uses an SFU: each participant sends one upstream stream to the SFU, and the SFU forwards it to all other participants. The SFU does not decrypt — it only routes encrypted frames. (Compare to an MCU, which decodes and mixes — that would break E2E.)
Compared to other messengers, WhatsApp's server-side storage footprint is small — because most data lives on user devices, not on the server. Three storage tiers, each with very different growth curves.
~500GB for 2B users × ~5 devices each × ~50 bytes per device record. PostgreSQL, sharded by phone number. Replicated across regions. Grows linearly with user count, not message count.
~5-10TB. Cassandra/HBase, sharded by device_id. Each row is one undelivered message for one device. Rows are deleted on delivery ack, so the table is flow-through, not accumulating. Even at 100B msgs/day, the queue rarely exceeds ~100M rows at any given second.
S3-class object store. Petabytes of encrypted media. Lifecycle policy: media is kept for ~30 days after delivery (so latecomers / new-device-syncs can fetch), then deleted. The encryption key is per-message and lost from the server immediately — after deletion, even subpoena-recovered S3 bytes are unreadable.
The fundamental architectural choice is: the server is a pipe, not an archive. A user's full message history lives on their device(s). Once Mom's phone acks "got the photo," the server's copy is gone. This is what keeps the server storage manageable — without it, 100B msgs/day × 5 years would be 180 trillion messages the server would need to durably hold, an unmanageable storage cost.
The flip side is the trade we already saw in §7: backup-and-restore must be implemented separately, by encrypted backups to iCloud/Google Drive, where the user holds the key.
2B users, 500M concurrent connections, 5M peak msgs/sec — there is no single-box version of any tier. Everything is sharded.
The Routing Service maps user_id → connection_server_id using consistent hashing on a virtual ring. Adding a new Connection Server only relocates 1/N of users (a brief reconnect for them), not the entire fleet. When a server dies, its share is redistributed and the affected clients reconnect.
Why device_id and not user_id? Because each device is its own delivery target — we want all of one device's pending messages co-located on one shard for fast read on reconnect. Cassandra wide-column rows keyed by (device_id, message_id), replicated 3× per shard.
Phone numbers are uniformly distributed (modulo country-code skew), so simple hash sharding works. A user lookup is one shard hit. Replicated across regions for low-latency contact discovery.
Pre-key bundles for one user live on one shard. When Sarah's phone fetches Mom's keys, that's a single shard read. Pre-key pool refills from the device write a small batch back to the same shard.
Messaging is critical infrastructure: 99.99% means no more than 53 minutes of downtime per year. Failure modes are handled at every tier.
50K-1M sockets drop simultaneously. Affected clients detect the TCP RST within seconds and reconnect via the Load Balancer to a healthy server. Routing Service ④ updates the new mappings. Pending messages were stored elsewhere, so no message loss — just a few seconds of perceived offline.
Replicated 3× per shard via Cassandra. Quorum writes (W=2) mean a single replica failure doesn't block writes. Quorum reads (R=2) heal divergent replicas on the fly. Region-level outages handled by cross-region replication.
The most sensitive tier — without it, no message can be routed. Backed by ZooKeeper / etcd or Mnesia for consistent metadata. Quorum-based — tolerates minority node loss. Latency-sensitive enough that reads are usually served from in-memory Erlang / Redis caches at the Connection Server.
Connection Servers in NA, EU, APAC, LATAM. Clients connect to the geographically nearest region. Cross-region delivery flows: Sarah (NA region) → her local Connection Server → Routing Service says Mom is in APAC → push to APAC region's Connection Server → Mom's socket. Adds ~150ms intercontinental latency, still well under 500ms p99.
Suppose APAC's Mumbai data center goes dark. Mom's phone immediately loses its socket. Within seconds, her phone's reconnect attempt routes (via DNS / GeoIP) to the next-closest region — say, Singapore. Routing Service updates (mom_user, mom_device) → conn-sg-7. New incoming messages for Mom from anywhere in the world now flow to Singapore. When Mumbai comes back, traffic gradually rebalances. No message loss because the Pending Message Store is multi-region replicated.