Distributed Unique ID Generation

Step 1

Twitter mints 6,000 IDs every second. Discord routes 4 billion messages a day — each with a unique ID. None of them ask a database for these IDs. Why? And what do they do instead?

The naive way (and why it breaks)

The simplest way to give every row a unique number is a database AUTO_INCREMENT column. The database keeps a counter and hands out 1, 2, 3, … It works perfectly when there's one database.

Now imagine your app grows. You add a second server. A third. Maybe a second region. Two questions appear:

🐢 One central counter

Every server pings the database for the next ID. Easy to understand, but the database becomes a bottleneck — every write waits its turn. At a few thousand writes per second, this falls apart. Database goes down → no one can write anything.

💥 Each server counts on its own

No coordination needed, but Server A and Server B will both eventually pick the number "1234". You silently corrupt your data and don't notice for months.

What we actually need: every server, in every region, can hand out IDs locally — without asking anyone — and we still get zero collisions, forever.

Step 2

What Is a UUID?

A UUID (Universally Unique Identifier) is a 128-bit number, written as 32 hex characters with hyphens. You've seen them everywhere:

550e8400-e29b-41d4-a716-446655440000
// 32 hex chars = 128 bits = a really, really big number

The trick: 128 bits is so large (about 5 followed by 36 zeros) that if you generate one at random, the odds of generating the same one twice — anywhere on Earth, by anyone — are essentially zero. No coordination needed. That's the whole pitch.

Two flavors you'll actually meet

🎲 UUIDv4 — Pure random

128 bits of randomness. The default everywhere. What crypto.randomUUID() in your browser gives you. Used for session tokens, idempotency keys, anything that just needs to be unique and unguessable.

🕒 UUIDv7 — Time-ordered

Same size, but the first 48 bits are a timestamp. So when you sort UUIDv7s, they come out in the order they were created. Best of both worlds — unique and sortable.

Quick mental model: a UUID is a 128-bit number that's "random enough that it'll never collide". v4 is fully random. v7 starts with a timestamp so you can sort by creation time.

Step 3

UUIDs — Where They Win, Where They Hurt

UUIDs are great. But they're not always the right answer. Here's a clean breakdown.

✅ Use UUIDs when…

Tokens & API keys — need unguessable + unique
Idempotency keys — client makes one, server dedupes on it
Mobile / offline apps — generate IDs without a server
Trace IDs across services — created at the edge, no coordinator
Public/external IDs — don't want to leak your row counts

❌ Avoid UUIDv4 when…

You need them sorted — v4 is random, can't sort by creation time
Storage cost matters — 16 bytes vs 8 for a regular integer
Database write speed matters — random IDs fragment the index, slowing inserts
You need to debug "when was this created?" — v4 carries no timestamp

The verdict: UUIDv4 is great for tokens and external IDs. For database primary keys at scale, you want something better — either UUIDv7 (sortable) or Snowflake (compact 64-bit + sortable). Both come up next.

Step 4

Snowflake — Twitter's Solution

In 2010, Twitter hit a wall: their database couldn't keep up with assigning tweet IDs. They invented Snowflake — a way for any server to mint a unique 64-bit ID locally, without asking anyone.

The idea: split 64 bits into 3 parts

sign

41 bits — timestamp (ms)

when

10 bits — machine ID

where

12 bits — sequence

how many in this ms

🕒 Timestamp

Milliseconds since some chosen start date. Two IDs from the same server can't collide unless they're in the same millisecond.

🏷️ Machine ID

Each server gets a unique number (0–1023). Two different servers can't collide because their machine IDs differ.

🔢 Sequence

If a server mints multiple IDs in the same millisecond, the counter goes 0, 1, 2, … up to 4096. Then it waits for the next ms.

Put it together: same server + same millisecond + same counter is impossible. By construction, two Snowflake IDs can never be equal. No coordination on the hot path — every server just stamps its own clock + machine + counter.

Why this is brilliant: uniqueness comes from structure, not luck. UUIDs say "the odds are insanely low." Snowflake says "it's mathematically impossible." Plus the IDs sort by creation time for free, and they fit in 8 bytes instead of 16.

Step 5

How Each Server Gets a Unique Machine Number

The whole Snowflake design rests on one assumption: every server has a different machine ID. So how do we hand out 1024 unique numbers across servers — including new ones spinning up, old ones dying — without collisions?

The pattern: a tiny coordinator, used only at boot

When a server starts up, it asks a small coordinator service (ZooKeeper, etcd, or Redis) for an unused machine number. The coordinator picks one, marks it taken, and returns it. The server caches that number and uses it for the rest of its life.

flowchart LR S[New server starts] --> ZK[Coordinator
ZooKeeper / Redis] ZK -->|"lend you
machine #42"| S S --> G[Generates IDs forever
using machine #42] G -.->|never talks to coordinator again| ZK style S fill:#4a90d9,stroke:#4a90d9,color:#fff style ZK fill:#9b72cf,stroke:#9b72cf,color:#fff style G fill:#38b265,stroke:#38b265,color:#fff

The key: the coordinator is only on the boot path. Once a server has its machine number, it never talks to the coordinator again. So if the coordinator goes down, existing servers keep minting IDs just fine — only new server startups are blocked.

The simple rule: coordinate once at boot, never on the hot path. This is the trick that makes Snowflake fast at any scale.

Step 6

UUIDv7 — The Easy Modern Choice

Snowflake is brilliant but it has a cost: you need a coordinator. If you don't want to run one, UUIDv7 is the modern answer.

The shape: take a regular UUID, but force the first 48 bits to be a Unix-millisecond timestamp. The remaining bits are random. Because the timestamp leads, sorting UUIDv7s sorts them by creation time — fixing the #1 weakness of UUIDv4.

✅ What you get

Sortable — leading timestamp orders correctly
No coordinator — pure local generation
Index-friendly — sequential inserts, no fragmentation
Drop-in replacement for UUIDv4

⚠️ What you give up

16 bytes vs 8 — twice the size of Snowflake
No machine ID embedded — can't decode "which server made this"

The pragmatic take: UUIDv7 is the default for new systems unless you specifically need 64-bit integer IDs. Snowflake wins when storage cost matters at billions of rows or you need machine-of-origin debugging.

Step 7

Other Variants Worth Knowing

Many big companies have their own Snowflake-flavored ID scheme. Same idea, different bit splits, tuned for their constraints.

Name	Size	Used by	Why
Twitter Snowflake	64 bits	Twitter (original)	The reference design
Discord	64 bits	Discord	Sortable IDs for fast debug
Instagram	64 bits	Instagram	Embeds shard ID for sharded DB
MongoDB ObjectId	96 bits	MongoDB	Built into Mongo, no setup
UUIDv7	128 bits	RFC 9562 (modern std)	No coordinator, sortable
ULID	128 bits	Many startups	Like UUIDv7 but base32-encoded (26 char string)

Pattern: they all do the same thing — combine a timestamp with some randomness or a machine ID, so each server can mint locally and still avoid collisions.

Step 8

Common Mistakes

Each of these has shipped to production at a real company.

❌ Same machine ID on two servers

You set the machine ID via an env var, deploy two pods, both get "1". Now they collide. Always use a coordinator (ZooKeeper / Redis) to hand out machine IDs at boot.

❌ Server clock jumps backwards

NTP corrects the clock and time goes back by 1 ms. The server may mint an ID it already minted. Fix: track the last timestamp used; refuse to issue IDs until the clock catches up.

❌ Using `Math.random()` for UUIDs

It's not cryptographically secure. Use crypto.randomUUID() in JS, UUID.randomUUID() in Java, uuid.uuid4() in Python.

❌ Putting Snowflake IDs in public URLs

Sequential IDs leak business volume — competitors can infer your QPS. Use a hashed external ID for public URLs, Snowflake internally.

❌ UUIDv4 as PK on a billion-row table

Random inserts fragment the B-tree index. Inserts that took 5 ms start taking 50 ms. Switch to UUIDv7 or Snowflake.

❌ Storing UUIDs as `VARCHAR(36)`

You're storing 36 bytes when 16 will do. Use BINARY(16) in MySQL or the native uuid type in Postgres.

Step 9

Decision Guide — When to Use What

If you remember nothing else, remember this table.

Use case	Pick	Why
Session tokens, API keys	UUIDv4	Unique + unguessable, sorting not needed
Idempotency keys	UUIDv4	Client-generated, no coordinator
External / public IDs	UUIDv4	Don't want to leak row counts
New project's primary keys	UUIDv7	Sortable + no coordinator + drop-in for v4
Existing system, billions of rows	Snowflake	Storage matters; you can run a coordinator
Mobile / offline-first writes	UUIDv7	Phone has no coordinator; still want sortable
Just learning, simple app	DB `AUTO_INCREMENT`	Simplest thing that works at low scale

flowchart TD A{Will it be a
database primary key?} -->|No - token, idempotency,
external ID| V4[UUIDv4] A -->|Yes| B{High write volume?
billions of rows} B -->|No| V7a[UUIDv7] B -->|Yes| C{Can you run
a coordinator?} C -->|No| V7b[UUIDv7] C -->|Yes, want compact 64-bit| SF[Snowflake] style V4 fill:#38b265,stroke:#38b265,color:#fff style V7a fill:#3cbfbf,stroke:#3cbfbf,color:#fff style V7b fill:#3cbfbf,stroke:#3cbfbf,color:#fff style SF fill:#e8743b,stroke:#e8743b,color:#fff

Step 10

Quick Q&A

Why not just use a database to give out IDs?

It works fine for small apps. But every write has to wait for the database, which becomes a bottleneck around a few thousand writes per second. And if the database goes down, no one can write anything.

If UUIDs are random, can two ever be the same?

Mathematically yes, practically no. There are 2¹²² possible UUIDv4 values. To even start seeing collisions, you'd need to generate billions per second for 70+ years.

What's the difference between Snowflake and UUIDv7?

Both are sortable. Snowflake is 8 bytes, UUIDv7 is 16. Snowflake needs a coordinator (to hand out machine IDs). UUIDv7 needs nothing — just a clock and a random number generator. Pick Snowflake if size matters, UUIDv7 if simplicity matters.

What happens if the server's clock jumps backwards?

If you don't handle it, you can mint duplicate IDs — bad. The fix: track the last timestamp you used, and if the current clock reads earlier, refuse to issue IDs until the clock catches up.

Why is sortability such a big deal?

Two reasons. (1) Database indexes love sorted inserts — they append to the end, no fragmentation. (2) Debugging: if IDs are sortable, you can decode "when was this created?" from the ID itself, no extra column needed.