Uber Backend — HLD Design

Read this with the framework in mind

This deep-dive applies the 4-step HLD interview framework. As you read, map each section to Requirements → Entities → APIs → High-Level Design → Deep Dives, and notice which of the 8 common patterns and key technologies are at play.

Framework → 8 Patterns → Tech Cheat Sheet →

Step 1

What is Uber?

It is 14:02 on a Tuesday afternoon. Raj is standing at Times Square with a suitcase, late for a meeting. He opens the Uber app and taps "Request UberX". Within 30 seconds, his phone vibrates: "Sandeep, Toyota Camry, 4 minutes away" — and on the map he sees a tiny car icon crawling toward him in real time. None of this looks complicated to Raj. But behind that one tap, an entire system has just done something impressive: it identified the eight nearest free drivers out of roughly 60,000 on the road in NYC, ranked them, offered the ride to the top three in parallel, locked the winner, and started streaming both Raj's and Sandeep's GPS positions to each other's phones.

Uber's backend is fundamentally a real-time matching engine: every driver reports their location every 3 seconds, every customer wants to see "drivers near me" updating live, and when a ride is requested the system must quickly find, rank, and dispatch the right driver. The hard part isn't any one of these — it's doing all three together at the scale of 1 million active drivers sending 167,000 location updates per second, while also serving 1 million ride requests per day.

The two questions that drive every design decision below: (1) How do we absorb 167K driver-location updates per second without melting our spatial index? (2) When Raj taps "Request Ride", how do we find the best 8 drivers near him in under a second, when the data describing where those drivers are is changing literally as we read it?

Step 2

Requirements & Goals

Pin down what the system must do before drawing a single box. Uber adapts the proximity-search problem from Yelp, but with one brutal twist: the things we are searching for move every 3 seconds.

✅ Functional Requirements

Drivers report their current location to the server every 3 seconds while online
Customers see nearby drivers on a map in near real-time
Customer requests a ride; nearby free drivers are notified of the request
Once a driver accepts, the ride is locked — both customer and driver see each other's live location until trip ends
Customer and driver can communicate (call/chat) during the ride
Trip data — fare, route, timestamps — is persisted for receipts and history

⚙️ Non-Functional Requirements

Real-time matching — find & assign a driver in under 30 seconds
Massive scale — 300M customers, 1M drivers (500K daily active), 1M rides/day
Highly available — outage = no rides booked = direct revenue loss
Strongly consistent on dispatch — no two riders can lock the same driver
Eventual consistency on display — a driver's blue dot can lag a couple seconds; that's fine

The tension to keep in mind: the read side (customers watching the map) wants the freshest possible driver locations. The spatial index (used for "find drivers near me") wants stable data so it can be queried fast. These two needs pull in opposite directions, and the entire architecture of Uber is the answer to that tension.

Step 3

Capacity Estimation & Constraints

Numbers are not optional in HLD — they are how we justify each architectural choice. Let's count drivers, updates, memory, bandwidth.

Population & activity

Assume 300 million customers and 1 million drivers on the platform. On any given day, about 500K drivers are online (driving and accepting rides) and 1 million rides are completed.

Active drivers

500K online

50% of 1M total fleet

Location updates

~167K/sec

500K × (1 update / 3s)

Rides per day

1M rides

~12 rides/sec average

Subscribed customers

~2.5M

5 subscribers per driver

Memory for the driver location hash table

Each driver record holds: driver_id (3B) + current_lat (8B) + current_long (8B) + prev_lat (8B) + prev_long (8B) ≈ 35 bytes. For 500K active drivers: 500,000 × 35 ≈ ~17.5 MB. That fits in L3 cache on a modern CPU. Memory is not the bottleneck — write throughput is.

Bandwidth for driver updates

Each update is roughly 19 bytes on the wire (driver_id + lat + lng). With 500K drivers updating every 3s: 500,000 × 19 bytes / 3s ≈ ~3.2 MB/sec inbound (Grokking quotes 19MB / 3s = ~6 MB/sec including overheads). Outbound to subscribed customers: 2.5M × 19 bytes/sec ≈ ~47.5 MB/sec — an order of magnitude bigger than inbound, because each driver update fans out to multiple watching customers.

Metric	Value	Why it matters
Driver updates/sec	`167K/s`	Drives the size of the Driver Location Server cluster
HashTable memory	`17.5 MB`	Tiny — fits per-shard in RAM, no disk on the hot path
Inbound bandwidth	`~6 MB/s`	Trivial; one server can absorb it (problem is CPU, not network)
Outbound to customers	`~47.5 MB/s`	10× the inbound — fanout dominates network planning
Concurrent rides	`~50K`	1M/day spread over peaks; drives state-DB throughput

Step 4

The Core Problem — Uber is Yelp on Fire

If you have already designed Yelp, you know the trick: a QuadTree, a tree of geographic cells where each leaf holds the businesses within a small bounded area. To find "restaurants near Times Square", you walk down the tree to the leaf for that location and read its 500 businesses. Fast, elegant, log-N lookups.

Uber's problem looks superficially identical — find drivers near a customer — so the natural impulse is to plug a QuadTree in and call it done. This is wrong. The reason is the one word that separates Yelp from Uber: movement. Yelp's businesses sit at fixed addresses. Uber's drivers do not.

The Yelp QuadTree assumes its data is roughly stationary. Inserts and deletes are rare — a new restaurant opens once a year. In Uber, every driver crosses cell boundaries constantly: a Camry doing 30 mph through Manhattan changes QuadTree leaves multiple times per minute. Multiply by 500K drivers and the tree is being mutated 167,000 times per second — each mutation requiring a remove from one leaf, an insert into another, a possible leaf split if it overflows, and a tree rebalance. The tree spends all its time being reorganized and almost none of its time answering queries.

So: we cannot put driver locations directly into the QuadTree. We need a different design — one that lets locations change rapidly without disturbing the spatial index. That's what the next section builds.

Step 5 · CORE

High-Level Architecture — From Naive to Production

The make-or-break section. We will build the architecture in three passes: a naive port of Yelp's design, the central insight that fixes it, and the production shape where every box earns its place.

Pass 1 — The naive design (and why it breaks)

Start with the obvious thing. Take Yelp's QuadTree, point the driver app at it, and have every driver write their new location into the tree every 3 seconds.

Three concrete failures emerge the moment a real city's worth of drivers turn on the app:

💥 167K tree mutations per second

Each driver update is not a benign read — it is a remove(old_cell, driver) followed by an insert(new_cell, driver). That's two structural mutations on the tree, both requiring write locks on the affected leaves. Across 500K drivers, the QuadTree spends almost every CPU cycle holding write locks and almost none answering queries from customers.

💥 5× the cost of a Yelp insert

In Yelp, an insert is a one-shot operation. In Uber, every driver move involves: locating the driver in their old leaf, removing them, locating the destination leaf, inserting them, possibly splitting the destination leaf if it now overflows, possibly rebalancing the parent. A simple "drove one block" can cascade into multiple structural changes.

💥 Leaf-overflow rebalancing storms

When everyone leaves the airport at 5pm, a single airport leaf cell suddenly receives hundreds of driver inserts in a few seconds — overflowing, splitting, and triggering parent-level rebalancing. During the rebalance, customer queries against that subtree block. Right when ride demand peaks, the index goes unresponsive.

Pass 2 — The mental model: decouple the firehose from the spatial index

The single most important insight in Uber's design: do not update the QuadTree on every driver location report. Instead, split the system into two layers that scale independently.

📡 The Live Layer — DriverLocationHT

A simple, sharded, in-memory hash table keyed by driver_id. The value is the driver's current and previous location. Every 3-second update from a driver is a single hash-table write — O(1), no tree traversal, no locks held across siblings. This is where the "fresh truth" lives.

Think of it as a giant live GPS dashboard: 500K rows, one per driver, each row constantly being overwritten with the latest GPS ping. Because it is just a hash table, 167K writes/sec is well within reach — Redis alone can do that on a couple of nodes, and we will shard it across many.

🗺️ The Spatial Layer — QuadTree (lazy)

The QuadTree we already know from Yelp, but updated lazily, every 15 seconds, by sweeping the latest snapshot of the DriverLocationHT into it. It tells us "which drivers are roughly in this neighborhood?" — and "roughly" is fine, because nearby is approximate by definition.

Think of it as a neighborhood map that gets reprinted every 15 seconds. By the time you read it, some drivers have moved a block — but for "find me drivers near Times Square", a 15-second-stale list is still a perfectly good candidate set. We will then look up each candidate's real location in the DriverLocationHT before showing them to the customer.

This two-layer split is the heart of Uber's design. Writes go to the cheap live layer. Reads look up candidates in the cheap-to-update spatial layer, then hydrate the actual positions from the live layer. Each layer scales on its own bottleneck.

Pass 3 — The production shape

Now the full picture. Every node is numbered — find its matching card below to see what it does and crucially what would break without it.

flowchart TB DR["① Driver App"] CU["② Customer App"] subgraph EDGE["Edge"] LB["③ Load Balancer"] end subgraph LIVE["Driver Location Plane"] DLS["④ Driver Location Server cluster"] HT[("⑤ DriverLocationHT — sharded by driver_id")] end subgraph SPATIAL["Spatial Index Plane"] QT["⑥ QuadTree Server cluster — lazy 15s updates"] end subgraph NOTIF["Notification Plane"] NS["⑦ Notification Service — pub/sub"] SM["⑧ Subscription Manager"] end subgraph MATCH["Matching Plane"] RM["⑨ Ride Matching Service"] DB[("⑩ Ride State DB")] EP["⑪ ETA / Pricing Service"] end DR -->|"location every 3s"| LB CU -->|"request ride / watch map"| LB LB -->|"driver updates"| DLS LB -->|"customer requests"| RM LB -->|"map view"| NS DLS --> HT DLS -.every 15s.-> QT DLS -->|"new location"| NS RM -->|"find nearby"| QT RM -->|"hydrate positions"| HT RM --> DB RM --> EP NS --> SM NS -->|"push to subscribed customers"| CU style DR fill:#e8743b,stroke:#e8743b,color:#fff style CU fill:#4a90d9,stroke:#4a90d9,color:#fff style LB fill:#171d27,stroke:#9b72cf,color:#d4dae5 style DLS fill:#171d27,stroke:#e8743b,color:#d4dae5 style HT fill:#171d27,stroke:#e8743b,color:#d4dae5 style QT fill:#171d27,stroke:#4a90d9,color:#d4dae5 style NS fill:#171d27,stroke:#3cbfbf,color:#d4dae5 style SM fill:#171d27,stroke:#3cbfbf,color:#d4dae5 style RM fill:#171d27,stroke:#38b265,color:#d4dae5 style DB fill:#171d27,stroke:#38b265,color:#d4dae5 style EP fill:#171d27,stroke:#d4a838,color:#d4dae5

Component-by-component — what each numbered box does

Use the numbers in the diagram above to find the matching card. Each one answers what is this, why is it here, and what would break without it.

① Driver App

The Uber Driver app running on Sandeep's phone. While he is online, it samples GPS every 3 seconds and posts the new (lat, long) to our backend over a persistent WebSocket. It also receives push notifications: ride offers, navigation cues, "trip cancelled" events. From the system's perspective, the driver app is the sensor that produces the firehose — 167K events/sec across the whole fleet.

Solves: the data acquisition problem. Without a steady stream of locations, the system literally has no idea where any driver is, and the entire matching premise collapses. The 3-second cadence is a deliberate trade-off: any faster wastes battery and bandwidth; any slower and the customer sees a stale-looking map.

② Customer App

The Uber Rider app running on Raj's phone. It does two things: (a) shows a live map with nearby driver dots when Raj is browsing pre-ride, and (b) opens a real-time channel to a specific driver's position once a ride is booked. Both happen over the same WebSocket the customer holds open with the backend.

Solves: the user-facing experience. Without it, Raj has no way to see "is a driver actually nearby?" before committing to request, and no way to watch his driver approach. The two modes — pre-ride map vs. in-ride tracking — also drive the push-vs-pull design choice we will discuss in §8.

③ Load Balancer

A standard L4/L7 load balancer (AWS NLB/ALB or HAProxy) that fronts the entire backend. It routes driver location updates to the Driver Location Server cluster, ride requests to the Ride Matching Service, and map-view subscriptions to the Notification Service. Health checks every few seconds; unhealthy nodes are pulled from rotation.

Solves: single-instance bottlenecks and single-instance failure. Without an LB, one crashed Driver Location Server takes down location ingest for thousands of drivers. With it, we lose 1/N capacity for a few seconds until the failed node is evicted — and the surviving nodes absorb the slack.

④ Driver Location Server cluster

The first stop for every driver location update. Stateless workers that receive the WebSocket frame, parse out (driver_id, lat, long), and write it to the DriverLocationHT shard owned by that driver. Sized for the firehose: with 167K updates/sec and a single server comfortably handling ~10K writes/sec into Redis, we run roughly 20 of these behind the LB. They also publish each update to the Notification Service and, every 15 seconds, batch-flush the latest snapshot to the QuadTree cluster.

Solves: isolating the highest-throughput workload (location ingest) from the lower-throughput, latency-sensitive matching workload. If we let ride matching run on the same boxes that absorb 167K writes/sec, every dispatch would be slowed by GC pauses and lock contention from the firehose.

⑤ DriverLocationHT

The live ground truth: a sharded in-memory hash table (Redis Cluster, or in-process if we are willing to manage it ourselves) keyed by driver_id. Each value is a tiny record: {current_lat, current_long, prev_lat, prev_long} ≈ 35 bytes. Sharded by driver_id across many nodes so the 167K writes/sec are spread; replicated 2× so a node failure does not lose live positions; persisted to SSD periodically for crash recovery.

Solves: the "where is driver X right now?" query at constant time. Without it, the only place that knows the freshest position is the QuadTree — which we already established cannot be updated every 3 seconds. The HashTable is what lets the QuadTree be lazy.

⑥ QuadTree Server cluster

Same data structure as in the Yelp design: a tree of geographic cells, each leaf holding the IDs of drivers within that cell. Used exclusively for the spatial query "give me driver IDs near (lat, long) within R meters". Crucially, this tree is not updated by the driver firehose — it is rebuilt or incrementally refreshed every 15 seconds by sweeping the DriverLocationHT and reassigning drivers to leaves.

Solves: efficient proximity search at city scale. Without the QuadTree, every "find drivers near me" query becomes a brute-force scan over the whole HashTable — 500K row scans per ride request, computing distance for each. With the tree, the same query touches just one leaf containing maybe 50 candidates.

⑦ Notification Service

The publisher half of a pub/sub system. When a driver's location is updated in the DriverLocationHT, the Driver Location Server publishes the new (driver_id, lat, long) to a topic; the Notification Service then pushes it down WebSockets to every customer currently subscribed to that driver. Push delivery uses long-poll or persistent WebSocket — no polling. Average fanout: ~5 subscribers per driver (a customer browsing the map might be watching the 5 closest drivers; in-ride only 1 subscriber).

Solves: the live-update fanout problem. Without push, every customer would have to poll for "where are my nearby drivers now?" every couple of seconds — burning battery, wasting bandwidth, and putting an unnecessary read load on the system. Push delivers updates only when there is something new to deliver.

⑧ Subscription Manager

The state behind the Notification Service. It is a lookup of driver_id → [customer_ids subscribed to this driver]. When Raj's app opens the map and shows the 5 nearest drivers, his customer_id is added to the subscription list of those 5 drivers. When he closes the app or moves out of view, the subscription is dropped. With ~2.5M total subscriptions in flight at any time (500K drivers × 5 subscribers avg), this is a few hundred MB of state — sharded by driver_id alongside the DriverLocationHT.

Solves: "who do I push this update to?" without scanning all customers. Without the Subscription Manager, the Notification Service would either have to broadcast every driver update to every customer (catastrophic fanout) or maintain the subscription list inline in a way that does not scale. Separating it into its own service gives us a clean place to put TTLs, fairness limits, and abuse controls.

⑨ Ride Matching Service

The orchestrator that runs every time a customer hits "Request UberX". It does the dance: (a) ask the QuadTree for the candidate set of nearby drivers, (b) hydrate their current positions from the DriverLocationHT, (c) rank them by ETA / rating / surge eligibility, (d) notify the top-3 in parallel with a 15-second offer, (e) the first to accept wins; the others get a "ride taken" cancel, (f) if all decline or time out, expand the search radius and repeat. The ride is then locked in the Ride State DB.

Solves: the matching atomicity problem — no two riders may be matched to the same driver, and no driver should sit on three competing offers. Without a dedicated orchestrator, the locking semantics would have to be smeared across the apps and the location services, where they do not belong.

⑩ Ride State DB

A durable database (typically a sharded relational store like MySQL/Postgres, or Cassandra) that records each ride's lifecycle: REQUESTED → MATCHED → ENROUTE → IN_TRIP → COMPLETED, the fare, the route polyline, timestamps, and ratings. This is the source of truth for receipts, driver payouts, customer history, and disputes. Sharded by ride_id or customer_id; replicated for durability.

Solves: persistence for everything that must survive a restart of the matching plane. Driver locations can be lost on a crash (we will get fresh ones in 3 seconds), but a ride that has been booked and is mid-trip cannot vanish — Sandeep would not get paid and Raj would not get a receipt.

⑪ ETA / Pricing Service

Two related responsibilities. ETA: given (driver_lat, driver_long) and (customer_lat, customer_long), call out to a routing engine (OSRM, Google Directions API, or in-house) to estimate "how many minutes to pickup?" — used by the Matching Service when ranking candidates. Pricing: compute the dynamic fare in real time, factoring base rate, distance, time-of-day, and the local surge multiplier (which itself is recomputed periodically per region from the demand-vs-supply ratio).

Solves: two costly computations that the Ride Matching Service does not want to do inline. Routing is CPU-heavy and benefits from its own cache; surge pricing needs aggregated per-region stats and changes on its own schedule. Splitting them out lets matching stay fast.

Concrete walkthroughs — Sandeep drives, Raj books

Two real flows, mapped to the numbered components above:

📡 Flow A — Driver Sandeep reports location, every 3s

Sandeep's Driver App ① samples GPS, sends {driver_id: 7821, lat: 40.7589, long: -73.9851} over its open WebSocket to the Load Balancer ③.
LB forwards to a Driver Location Server ④ in the cluster — the one responsible for the shard containing driver 7821.
Driver Location Server overwrites the row in DriverLocationHT ⑤. Old position becomes prev; new becomes current. ~1ms.
Driver Location Server publishes the update to the Notification Service ⑦.
Notification Service asks the Subscription Manager ⑧: "who's subscribed to driver 7821?" → gets back 5 customer IDs (Raj is one of them — he's currently looking at the map).
Notification Service pushes Sandeep's new position to all 5 customers' open WebSockets. Raj's app smoothly slides Sandeep's car icon along Broadway. End-to-end ~80ms from GPS sample to pixel update.
Every 15 seconds, the Driver Location Server also batch-flushes its shard's snapshot to the QuadTree Server ⑥, which reassigns drivers to their (possibly new) leaf cells.

🚗 Flow B — Raj requests a ride at Times Square, 14:02:06

Raj taps "Request UberX" in the Customer App ②. The app POSTs {customer_id: 1234, lat: 40.7580, long: -73.9855, product: UberX}.
Load Balancer ③ routes to the Ride Matching Service ⑨.
Matching Service queries the QuadTree ⑥: "give me UberX drivers within 3km of Times Square" → gets back 24 candidate driver IDs.
Matching Service hydrates each candidate's real position from DriverLocationHT ⑤ — some have moved a block since the QuadTree's last 15s flush, that's fine.
Matching Service calls ETA / Pricing ⑪ to compute pickup ETA for each candidate. Ranks them by (ETA × 0.5 + (5 − rating) × 0.3 + accept_rate × 0.2).
Top 3 drivers (Sandeep, Aisha, Carlos) are sent ride offers in parallel via Notification Service ⑦. Each has 15 seconds to accept.
Sandeep taps Accept first. Matching Service writes the ride to Ride State DB ⑩ as MATCHED with driver=7821. Aisha and Carlos's offers are auto-cancelled.
Both Raj and Sandeep are now subscribed to each other's locations via the Subscription Manager ⑧, and both apps watch the other approach in real time. Total elapsed from tap to "Sandeep is on his way": 11 seconds.

So what: the architecture is built around three insights — (1) location updates and spatial queries have wildly different shapes, so they get separate planes (DriverLocationHT vs QuadTree); (2) "nearby" is approximate by definition, so the spatial index is allowed to be 15 seconds stale, which is what makes scale possible; (3) the customer-facing live view uses pub/sub straight off the firehose, so customers see fresh data even while the QuadTree lags. Every box in the diagram exists to remove one of the failure modes from Pass 1.

Step 6

Driver Location HashTable (DriverLocationHT)

The single most important data structure in the system. Everything we do hangs off the assumption that this hash table can absorb 167K writes/sec and answer point lookups in microseconds. Let's design it concretely.

📐 Schema

One row per driver:

driver_id      // 3 bytes
current_lat    // 8 bytes (double)
current_long   // 8 bytes (double)
prev_lat       // 8 bytes
prev_long      // 8 bytes

Total ~35 bytes/row. For 500K active drivers: 17.5 MB. Tiny.

✂️ Sharding

Sharded by driver_id across the Driver Location Server cluster. Each server owns a slice of the keyspace and holds its slice's hash-table partition in memory. With 20 servers, each owns ~25K drivers and absorbs ~8K writes/sec — well within Redis's single-thread limits.

Sharding by driver_id (not by location) is deliberate: a driver's shard never changes as they move. If we sharded by location, every cross-cell movement would be a cross-shard migration — the very problem we are trying to avoid.

💾 Persistence & Recovery

Replicated 2× (primary + secondary in another AZ) so a node death does not lose live positions. Snapshot to local SSD every few seconds — on a crash, we lose at most a few seconds of updates, and drivers will overwrite them within the next 3-second cycle anyway.

Important property: this data is recoverable from the source — every driver's app will resend its current position on reconnect. So we treat persistence as a fast-restart optimization, not a durability requirement.

🔁 Why "current + previous" both?

Storing both the current and previous location lets us derive driver heading and speed without an extra round trip — useful for the ETA service and for showing a smoothed-out trajectory in the customer's map (the icon glides between prev and current rather than teleporting).

It also helps reconcile out-of-order updates: if we receive a "stale" update with a timestamp older than current but newer than prev, we slot it into prev rather than discarding it.

Analogy: the DriverLocationHT is like a giant live-flight tracker dashboard. Every plane has one row that is being constantly overwritten with its latest GPS ping. You can look up any plane in O(1), and you do not care about historical positions — only "where is it right now?"

Step 7

Notification Service / Pub-Sub

Once a driver's position changes in DriverLocationHT, how does the change reach the (up to 5) customers watching that driver? The answer is a publisher/subscriber system — and getting it right is what makes the customer's map feel "alive".

sequenceDiagram actor Driver participant DLS as Driver Location Server participant HT as DriverLocationHT participant NS as Notification Service participant SM as Subscription Manager actor Customer Driver->>DLS: location every 3s DLS->>HT: write current + prev DLS->>NS: publish driver_id, new_lat, new_long NS->>SM: who subscribes to driver_id? SM-->>NS: list of customer_ids loop for each subscribed customer NS->>Customer: push new location over WebSocket end

Push delivery, not poll

The customer app does not poll the server for "where are nearby drivers now?". Instead, when the customer opens the map, their app sends a one-time "I want to subscribe to drivers in this area" request. The Subscription Manager registers the customer's interest in the 5 closest driver_ids. From then on, updates are pushed as they happen, over the customer's persistent WebSocket connection.

Bandwidth math

Average 5 subscribers per driver × 500K drivers × 19 bytes/update / 3s ≈ ~16 MB/sec base outbound, with peaks during commute hours pushing toward ~47 MB/sec. Spread across 10 Notification Service instances, each handles ~5 MB/sec — trivial for a server with a 1 Gbps NIC.

Subscription lifecycle

📍 Subscribe

Customer opens map → app picks 5 nearest driver_ids from the initial Matching Service response → sends "subscribe" frame → Subscription Manager stores customer_id → [driver_ids] and reverse driver_id → [customer_ids].

🔄 Resubscribe

As the customer pans the map or as drivers move out of range, the app periodically sends a refreshed subscribe list. Stale subscriptions naturally TTL out.

❌ Unsubscribe

App backgrounded, network dropped, or trip ended → Notification Service sees the WebSocket close and clears the customer's subscriptions across all driver_ids.

Step 8

Push vs Pull for Customers — When to Use Each

One of the highest-leverage trade-offs in the design. Should the customer app pull "where are the nearby drivers?" every few seconds, or should the server push updates as they happen?

⬆️ Push (server-initiated)

Pros:

Lowest latency — update arrives as soon as the driver moves
No wasted requests when nothing has changed
Smooth, "live" feel on the customer's map

Cons:

Server has to maintain subscription state (Subscription Manager)
Persistent WebSocket per customer — connection-state cost on the server
Fanout amplifies bandwidth (5× the inbound)

⬇️ Pull (client polling)

Pros:

Stateless server — much simpler ops
No long-lived connections; works fine over plain HTTP
Client controls cadence based on its own UI state

Cons:

Up to polling_interval seconds of staleness — a 5s poll feels jerky
Wasted requests when nothing changed (most polls)
Fixed read load even when a customer's screen is off

The production answer: hybrid by mode

Uber-style systems split the customer experience into two modes and pick the right tool for each:

🗺️ Pre-ride (browsing the map)

Pull with debounced polling. The customer is just window-shopping — a 5-second-stale list of nearby drivers is fine. Polling avoids opening 300M persistent WebSockets to people who may never request a ride. The poll is debounced: paused while the app is backgrounded, throttled when the screen has not moved for a while.

🚖 In-ride (driver assigned)

Push over WebSocket. Once Sandeep accepts Raj's ride, both apps open a persistent connection and the server pushes every location update for the duration of the trip. The user-experience bar here is high — "where is my driver?" needs to feel real-time. Connection cost is bounded: only riders/drivers in active trips have a socket open (~50K concurrent at peak, vs. 300M total customers).

So what: the same system uses both push and pull, picked by the cost-of-staleness in each scenario. Browsing customers tolerate 5s lag and number in the millions → pull. In-ride customers demand real-time and number in the tens of thousands → push.

Step 9

QuadTree Lazy Updates — Why 15s Is Fine

The QuadTree is deliberately allowed to be stale by up to 15 seconds. This is not a bug; it is the central trade-off that makes the system possible. Let's defend it.

How the lazy update works

Every 15 seconds, each Driver Location Server takes a snapshot of its DriverLocationHT shard and ships it to the QuadTree Server cluster. The QuadTree then reassigns each driver to its (possibly new) leaf cell. Drivers who haven't moved beyond their current cell don't need any structural change. Drivers who crossed a cell boundary are moved between leaves in a single batched operation — far cheaper than 5 individual moves over the same 15 seconds.

Why customers don't notice

When the Ride Matching Service queries "drivers near (lat, long)" at time T, the QuadTree returns driver_ids based on the snapshot from up to T-15s. But: matching then hydrates those driver_ids from the live DriverLocationHT, so the actual lat/long shown to the customer is current to within the last 3 seconds. The 15-second staleness only affects "is this driver still in this cell?" — and a driver who has driven a block in 15s is still well within "nearby" radius, so it doesn't change the answer.

Numerical sanity check: a driver going 30 mph travels ~200 meters in 15 seconds. The smallest QuadTree leaf cells are typically sized at 500m+ (set by the leaf-overflow threshold). So even the worst-case staleness rarely moves a driver into a different cell — and when it does, the customer-facing impact is "this driver who was a candidate is now slightly farther than reported", which the matching service handles by re-querying after each round of declines.

Before vs after

Before (Pass 1, eager updates): 167K tree mutations/sec, locks held constantly, leaf-overflow rebalancing storms during airport rush. After (lazy 15s flush): ~33K batched moves every 15s = ~2.2K mutations/sec amortized, no lock contention with reads, leaf splits happen during the calm batch window — not in the middle of a query.

Step 10

Ride Matching Flow — From Tap to Locked Driver

When Raj taps "Request UberX", the Matching Service runs a tight orchestration. The goal: assign a driver in under 30 seconds with the right balance of speed (close drivers) and quality (high-rated drivers).

sequenceDiagram actor Rider participant RM as Ride Matching Service participant QT as QuadTree participant HT as DriverLocationHT participant EP as ETA / Pricing participant NS as Notification Service actor TopDriver as Top-3 Drivers Rider->>RM: request ride at lat, long RM->>QT: drivers within 3km QT-->>RM: 24 candidate driver_ids RM->>HT: hydrate current positions HT-->>RM: positions RM->>EP: ETA + price for each candidate EP-->>RM: ranked list RM->>NS: send offer to top-3 in parallel NS->>TopDriver: ride offer, 15s timeout alt First driver accepts TopDriver-->>NS: ACCEPT from driver 7821 NS-->>RM: driver 7821 accepted RM->>NS: cancel offers to others RM-->>Rider: matched with Sandeep, ETA 4 min else All decline or timeout RM->>QT: expand radius to 6km, retry end

Atomicity — only one rider gets each driver

Two riders can hit "Request" within milliseconds of each other and both see the same nearby driver. The lock must be atomic. The Matching Service uses a row in the Ride State DB as the locking primitive — when a driver accepts, the service runs a conditional update SET ride_id = X WHERE driver_id = 7821 AND ride_id IS NULL. If two acceptances race, only one update succeeds; the loser gets "ride taken" and the rider is rolled back to the next-best candidate.

Cascade on decline

If the top-3 all decline within their 15s window, Matching expands the radius (3km → 6km → 10km) and retries. After three expansions it tells the rider "no drivers available" — rare in dense markets, common at 3am in suburbs.

Step 11

Fault Tolerance & Replication

Every component on the hot path is replicated. When something dies — and it will — the system must keep matching rides without a customer-visible blip.

📡 Driver Location Servers

Stateless workers behind the LB. If one dies, the LB evicts it on the next health check; surviving servers absorb the slack. The DriverLocationHT shard owned by the dead server fails over to its secondary replica — drivers may briefly see "reconnecting" but resume sending updates within seconds.

💾 DriverLocationHT replication

Each shard has a primary + 1 secondary in a different AZ. Writes go to primary; secondary is kept in sync via async replication (sub-100ms lag). On primary failure, secondary is promoted; up to 100ms of writes may be lost — recovered automatically by the next driver heartbeat.

🗺️ QuadTree replication

QuadTree is rebuildable — its source of truth is the DriverLocationHT. We keep 2 hot replicas for read scaling and fast failover; if both die, a fresh tree can be rebuilt from the HashTable in 30-60s.

📣 Notification Service

Stateless instances behind the LB. When one dies, customer WebSockets reconnect to a survivor; the Subscription Manager (sharded, replicated) tells the new instance which drivers each customer was watching, so the push stream resumes within ~2 seconds.

🚗 Ride Matching Service

Stateless orchestrator. Each request is independent — failure of one instance just means the rider hits another. In-flight ride locks live in the Ride State DB, not in service memory, so a service crash mid-match recovers cleanly: the driver's offer simply expires.

🗄️ Ride State DB

Sharded relational store with synchronous replication across 3 AZs. Writes acknowledge only after the quorum confirms. A complete primary failure triggers a managed failover (30-60s) — during that window, new ride creation pauses but in-flight rides continue using cached state.

Step 12

Ranking Drivers — Picking the "Best" Match

The QuadTree gives the Matching Service a candidate set. Which one should we offer the ride to first? "Closest" is the obvious answer but it's not the right one — the driver who is closest by physical distance might be stuck on the wrong side of a one-way street, have a low rating, or have declined their last 5 rides.

A composite scoring function

Each candidate gets a score:

score(driver) =
    w1 × eta_minutes          // real driving time, not crow-flies distance
  + w2 × (5 − rating)         // higher rating = lower penalty
  + w3 × (1 − accept_rate)    // drivers who decline a lot get pushed down
  + w4 × prior_trip_count      // experienced drivers slightly preferred

// Lower score = better. Top-3 by score get parallel offers.

Weights are tuned per market and per product (UberX vs Black vs Pool may weight rating differently). The scoring runs in the Matching Service after hydration; with 24 candidates and 4 features, it's microseconds.

Why not just closest?

Pure-closest leads to two pathologies. (1) Driver gaming: a driver could park 50m from a busy spot and refuse rides going elsewhere. (2) Bad customer experience: the closest driver is 0.5km away but rated 3.8 with a 40% acceptance rate — Raj would rather wait an extra 90 seconds for the 4.9-rated driver who is 1km away.

Step 13

Advanced Issues

The features that bring Uber from "works" to "production-grade".

📈 Surge Pricing

Compute demand-vs-supply ratio per region (hex cell or QuadTree leaf) every minute. If open_requests / available_drivers > threshold, the multiplier rises (1.5×, 2×, 3×). Multiplier is published to the Pricing Service, which factors it into the fare estimate when the rider taps the request button. Surge is regional and time-bounded — during a Knicks game finishing at MSG, the surge pulses up for 20 minutes then decays.

🛣️ ETA Calculation

"How many minutes for Sandeep to reach Raj?" is not the straight-line distance — it's an actual driving-route ETA factoring traffic, one-ways, and turn restrictions. Computed by a routing engine (OSRM / Google Directions / in-house Graphhopper) that reads pre-built road graphs. Cached per (origin-cell, destination-cell, time-of-day) to avoid per-request routing calls.

🛑 Multi-stop trips

Rider adds 2-3 stops before requesting. Matching considers the full route's ETA, not just pickup-to-first-drop. Surge applies to the whole trip's expected duration. The driver gets the multi-stop hint in the offer so they can accept knowingly.

🤝 Pool / shared rides

Two riders going in roughly the same direction share a car. Matching becomes harder: the system has to predict which rider requests will arrive in the next few minutes and proactively defer matching to bundle compatible riders. Often run as a separate matching pipeline that opportunistically merges adjacent solo requests.

🛡️ Fraud detection

GPS spoofing (driver fakes a location to claim a fare without actually driving), fake accounts farming sign-up bonuses, collusion between rider and driver. Detected by ML models reading event streams: implausible velocity (200 mph), repeated rider-driver pairs, disabled accelerometer/gyroscope sensors. Flagged trips trigger manual review.

🌍 Multi-region operation

Riders and drivers are matched within their city only — there is no point matching a Mumbai rider with a Delhi driver. The whole stack (DriverLocationHT, QuadTree, Matching) is partitioned per-region. Cross-region functions (account, payments, history) live in a separate global plane.

Step 14

Interview Q&A

Why not update the QuadTree on every driver location report?

Because it would melt under 167K mutations/sec. Each driver update is a remove + insert in the tree, both requiring write locks, and busy areas cause leaf-overflow rebalancing storms. The tree spends all its time being reorganized and almost none answering queries. Decoupling the live updates (HashTable) from the spatial index (QuadTree, refreshed every 15s) lets each scale on its own bottleneck — writes hit a cheap O(1) hash table; reads hit a stable tree.

How do you handle 167K driver updates per second?

Shard the DriverLocationHT by driver_id across the Driver Location Server cluster. Each shard owns ~25K drivers and absorbs ~8K writes/sec — well within Redis or in-process hash-table limits. The work per update is O(1): overwrite the row, publish to Notification Service, done. Sharding by driver_id (not by location) means a driver's shard never changes as they move, so cross-shard migrations are zero. CPU, not network or memory, is the limit — and 20 modest servers comfortably absorb the firehose.

Push vs pull for showing nearby drivers — when do you use each?

Hybrid by mode. Pre-ride (customer browsing the map) → pull with debounced polling. Customer is just window-shopping; a 5s-stale view is fine, and polling avoids opening 300M persistent WebSockets. In-ride (driver assigned) → push over WebSocket. The "where is my driver?" bar is real-time, and the population in active trips (~50K concurrent) is small enough that connection cost is bounded. Same system, two modes, picked by cost-of-staleness.

How do you match a rider to the best driver in 30 seconds?

Three-stage pipeline. (1) Candidate set from the QuadTree — drivers within 3km, ~24 results. (2) Hydrate live positions from DriverLocationHT and rank by composite score (ETA + rating + accept_rate). (3) Notify top-3 in parallel with a 15-second offer; first to accept wins, others get cancelled. If all decline, expand radius and retry. Total wall time: typically 10-15s, well under the 30s SLA. The parallel offer is the key — sequential offers would burn 45s before getting a yes from the third option.

What if two riders try to book the same driver at the same time?

Atomic conditional update on the ride state row. When a driver accepts, the Matching Service runs SET driver_id = X WHERE driver_id IS NULL (or equivalent compare-and-swap in the DB). Only one update succeeds — the loser's rider gets a "driver no longer available" signal and is rolled back to the next-best candidate. The DB is the locking primitive because it is already on the hot path; introducing a separate distributed lock would add latency and a new failure mode without solving anything.

How is this different from Yelp's design?

Yelp's data is roughly stationary; Uber's data moves every 3 seconds. Yelp can put businesses directly into the QuadTree because inserts are rare. Uber cannot — 167K mutations/sec would destroy the tree. So Uber introduces a second layer: the DriverLocationHT absorbs the firehose at O(1), and the QuadTree is updated lazily every 15 seconds from snapshots. Reads hit both: the QuadTree gives candidate IDs, the HashTable hydrates their actual positions. Yelp's design has no such split because it doesn't need one.

What happens if a Driver Location Server dies?

Three layers of recovery. (1) The shard's secondary replica is promoted to primary in seconds. (2) Drivers' apps reconnect to the LB and are routed to the new primary — their next 3-second heartbeat re-establishes the live position. (3) Up to 100ms of writes may be lost during failover, but they are immediately overwritten by the next heartbeat. The QuadTree survives untouched since it reads from the DriverLocationHT on its own 15s cadence. Net effect: a Driver Location Server crash is invisible to riders and drivers beyond a brief reconnect.

How do you scale to a new city?

The whole stack is regionally partitioned. When Uber launches in a new city, we provision a new Driver Location Server cluster, a new QuadTree, a new Notification Service slice, and route those drivers and riders to the new region. Cross-region services (payments, account, support) stay global. This means we can scale linearly with cities — adding Lagos doesn't make matching slower for São Paulo. The only global coordination is when a driver crosses a regional boundary (rare), at which point their session is migrated.

The one-line summary the interviewer remembers: "Uber is Yelp plus a firehose, so we split the firehose from the spatial index — driver location updates land in a sharded in-memory hash table at O(1), the QuadTree is rebuilt from snapshots every 15 seconds, and customers either pull (when browsing) or get pushed updates (when matched). Two layers, two cadences, each scaling on its own bottleneck."