What Are UUIDs and Why Do They Matter?

Imagine you have a dozen microservices, each running on different servers across multiple data centers. Every service needs to create unique identifiers for records — users, orders, events, messages. You cannot have two services generate the same ID, but you also cannot afford to call a central server for every ID assignment. That would be a bottleneck and a single point of failure.

This is the problem that UUIDs solve. A UUID (Universally Unique Identifier) is a 128-bit identifier that can be generated independently on any machine, at any time, without coordination. The probability of two randomly generated UUIDs colliding is so astronomically low that it can be treated as zero for all practical purposes.

UUIDs are everywhere: database primary keys, distributed tracing IDs, API request identifiers, session tokens, file names, and message queue correlation IDs. Every major programming language has built-in UUID support. But not all UUID versions are equal — choosing the wrong one can degrade database performance, leak private information, or introduce subtle bugs.

Anatomy of a UUID

Every UUID follows the same 128-bit format, displayed as 32 hexadecimal digits in five groups separated by hyphens:

550e8400-e29b-41d4-a716-446655440000
xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx

Two positions in this format carry special meaning:

The remaining 122 bits (for v4) or a mix of timestamp and random bits (for v1 and v7) make up the rest of the identifier. Here is how to check the version programmatically:

// JavaScript — extract UUID version
function getUUIDVersion(uuid) {
  return parseInt(uuid.charAt(14), 16);
}

getUUIDVersion('550e8400-e29b-41d4-a716-446655440000'); // 4
getUUIDVersion('018f3c4a-1b2e-7f00-8000-abcdef123456'); // 7

UUID v4 — The Default Choice

UUID v4 is the most widely used version. It is generated from 122 bits of cryptographically secure random data (the remaining 6 bits are reserved for the version and variant fields). There is no timestamp, no machine identifier, no sequence — just randomness.

// Node.js — generate a UUID v4
import { randomUUID } from 'crypto';
const id = randomUUID();
// → "f47ac10b-58cc-4372-a567-0e02b2c3d479"

// Python
import uuid
id = uuid.uuid4()
# → UUID('9b1deb4d-3b7d-4bad-9bdd-2b0d7b3dcb6d')

The collision probability of UUID v4 is vanishingly small. With 122 random bits, there are 5.3 x 1036 possible values. To put that in perspective, if you generated one billion UUIDs per second, you would need to run for about 85 years before reaching a 50% probability of a single collision. For most applications, this is indistinguishable from zero.

The birthday problem gives us the formula for estimating collisions. For n UUIDs drawn from a space of 2122 possibilities:

P(collision) ≈ 1 - e^(-n² / (2 × 2¹²²))

For n = 1 billion (10⁹):
P ≈ 10⁹² / (2 × 2¹²²) ≈ 9.4 × 10⁻²⁰

That is 0.0000000000000000094% — effectively zero.

When to use v4: UUID v4 is the right default for most applications. Use it when you need a unique identifier and do not care about time-ordering, sorting, or embedding any metadata in the ID. It is the safest, simplest, and most portable choice.

Limitations: UUID v4 is completely random, which means it is not sortable by creation time. When used as a database primary key in a B-tree index, random UUIDs cause scattered insertions across the index, leading to page splits and fragmentation. This becomes a measurable performance issue at scale (millions of rows).

UUID v1 — Time-Based

UUID v1 was one of the original UUID versions. It combines three pieces of information to guarantee uniqueness:

// Example UUID v1
"6ba7b810-9dad-11d1-80b4-00c04fd430c8"
//                ^                    version = 1
//                     ^               variant = 8 (RFC compliant)
//                          ^^^^^^^^^^^^  MAC address

The privacy problem: Because UUID v1 embeds the machine's MAC address, anyone who sees the UUID can identify which physical machine generated it. In a 2002 privacy case, Microsoft Word documents with embedded UUID v1 identifiers were used to trace the author's computer. This is a serious concern for any application that exposes IDs to end users.

The timestamp problem: The Gregorian epoch (1582) and 100-nanosecond precision mean the timestamp bits are not directly useful for modern applications that work with Unix timestamps in milliseconds. The byte ordering also makes v1 UUIDs non-sortable in their string representation, even though they contain a timestamp.

Verdict: UUID v1 is largely obsolete. It has been superseded by UUID v7, which provides time-based ordering without the privacy concerns. Avoid v1 in new projects unless you are working with a legacy system that requires it.

UUID v7 — The Modern Standard

UUID v7, introduced in RFC 9562 (published May 2024), is the modern answer to the limitations of both v1 and v4. It embeds a Unix timestamp in milliseconds in the first 48 bits, followed by random data. This makes v7 UUIDs both unique and time-sortable.

// UUID v7 structure (128 bits)
 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                   unix_ts_ms (32 bits)                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|   unix_ts_ms (16 bits)  | ver |      rand_a (12 bits)         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|var|               rand_b (62 bits)                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       rand_b (continued)                      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The first 48 bits store the Unix timestamp in milliseconds. The next 4 bits are the version (0111 for v7). Then 12 bits of random data, 2 variant bits, and finally 62 more bits of random data. This gives 74 bits of randomness per millisecond — enough to generate over 18 quintillion unique IDs within the same millisecond without collision.

K-sortability: Because the timestamp occupies the most significant bits, UUIDs generated later will always sort after UUIDs generated earlier when compared as strings or bytes. This property is called k-sortability, and it is the key advantage of v7 over v4.

// Generate UUID v7 in JavaScript (Node.js 20+)
// Node.js does not yet have a built-in v7, but you can use the 'uuid' package:
import { v7 as uuidv7 } from 'uuid';

const id1 = uuidv7(); // "018f3c4a-1b2e-7f00-8000-abcdef123456"
const id2 = uuidv7(); // "018f3c4a-1b2f-7123-9abc-def012345678"

console.log(id1 < id2); // true — lexicographically sortable by time

// Extract the timestamp from a v7 UUID
function extractTimestamp(uuid) {
  const hex = uuid.replace(/-/g, '').substring(0, 12);
  return new Date(parseInt(hex, 16));
}

extractTimestamp('018f3c4a-1b2e-7f00-8000-abcdef123456');
// → 2024-05-10T... (approximate creation time)

When to use v7: UUID v7 should be your first choice when IDs will be used as database primary keys, when you need chronological sorting, or when you want to extract approximate creation time from the ID itself. It combines the distributed generation of v4 with the time-ordering benefits that databases need.

UUID vs ULID vs NanoID

UUIDs are not the only game in town. Two popular alternatives are ULID and NanoID. Here is how they compare:

Feature UUID v4 UUID v7 ULID NanoID
Size 128 bits 128 bits 128 bits Configurable (default 126 bits)
String length 36 chars 36 chars 26 chars 21 chars (default)
Time-sortable No Yes Yes No
Standardized RFC 9562 RFC 9562 Community spec No formal spec
Case-sensitive No (hex) No (hex) Yes (Crockford Base32) Yes (URL-safe Base64)
Database support Native UUID type Native UUID type Stored as string/binary Stored as string
Monotonic within ms N/A Implementation-dependent Yes (by spec) N/A

ULID (Universally Unique Lexicographically Sortable Identifier) predates UUID v7 and solves the same problem: time-sortable unique IDs. It uses Crockford's Base32 encoding, producing a shorter 26-character string. However, ULID is a community specification without IETF backing, and most databases do not have a native ULID type — you must store them as strings or binary, losing the storage and indexing efficiency of a native UUID column.

NanoID is designed for short, URL-friendly IDs. At 21 characters by default, it is the most compact option. It is ideal for client-facing identifiers like short URLs, invite codes, or session IDs where brevity matters. However, it is not time-sortable, has no formal specification, and lacks native database support.

Recommendation: Use UUID v7 for server-side identifiers, especially database primary keys. Use NanoID when you need short, URL-friendly IDs for client-facing use cases. ULID was a great innovation, but UUID v7 has largely superseded it by bringing the same benefits into the official UUID standard with native database support.

UUIDs as Database Primary Keys

Using UUIDs as primary keys is a common practice, but the choice of UUID version has a dramatic impact on database performance. The issue comes down to how B-tree indexes work.

The v4 fragmentation problem: Most relational databases (PostgreSQL, MySQL, SQL Server) use B-tree indexes for primary keys. A B-tree keeps data sorted, and new entries are inserted into the correct sorted position. With auto-incrementing integers, new rows always go to the end of the index — the tree grows in one direction, and page splits are rare.

UUID v4, however, is completely random. Each new row is inserted at a random position in the index. This causes frequent page splits, where the database must split a full index page into two half-full pages to make room. Over millions of rows, this leads to:

Benchmarks consistently show that UUID v4 primary keys have 2-5x worse insert throughput than sequential keys on tables with millions of rows.

How v7 solves it: UUID v7's timestamp prefix means new UUIDs are always greater than older ones. In a B-tree, this behaves like an auto-incrementing key — new rows are appended to the end of the index, page splits are minimal, and the buffer pool cache stays hot.

-- PostgreSQL: Using UUID v7 as a primary key
CREATE EXTENSION IF NOT EXISTS pgcrypto;

CREATE TABLE orders (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(), -- v4 (built-in)
    customer_id UUID NOT NULL,
    total DECIMAL(10,2),
    created_at TIMESTAMPTZ DEFAULT now()
);

-- For UUID v7, use the pg_uuidv7 extension:
-- CREATE EXTENSION IF NOT EXISTS pg_uuidv7;
-- id UUID PRIMARY KEY DEFAULT uuid_generate_v7()

-- Or generate v7 UUIDs in your application layer:
-- INSERT INTO orders (id, customer_id, total)
-- VALUES ('018f3c4a-1b2e-7f00-8000-abcdef123456', ...)

Practical advice: