The Invisible Tax on Every Network Request
Every time your browser loads a page, your phone syncs data, or your backend calls an API, there is a delay. The request has to travel across cables, through routers, up into data centers, get processed, and come back. That delay is latency — and it is the single biggest factor determining whether your application feels fast or sluggish.
Latency is measured in milliseconds, but those milliseconds add up. A page that makes 40 requests to a server 150ms away spends six full seconds just waiting for data to travel back and forth — before your server even starts processing anything. For real-time applications like video calls, online games, or financial trading, every millisecond of latency directly impacts the user experience or the bottom line.
Unlike bandwidth, which you can throw money at to increase, latency is constrained by physics. Light in a fiber optic cable travels at roughly 200,000 kilometers per second — about two-thirds the speed of light in a vacuum. A round trip from New York to London (11,000 km) takes at least 55ms just for the signal to travel the cable, and that is the absolute theoretical minimum with zero processing overhead.
Latency vs Bandwidth — The Common Confusion
Latency and bandwidth are the two fundamental dimensions of network performance, but people conflate them constantly. When someone says their internet is "slow," they almost always mean low bandwidth — but the actual problem is often high latency.
Bandwidth is the capacity of the pipe. It measures how much data can flow through a connection per second, expressed in megabits per second (Mbps) or gigabits per second (Gbps). Think of bandwidth as the number of lanes on a highway. A six-lane highway can move more cars per hour than a two-lane road.
Latency is the speed of each individual trip. It measures how long it takes for a single packet to get from point A to point B. Think of latency as the speed limit on that highway. Even if you have a 12-lane highway, if the speed limit is 20 mph, each individual car still takes a long time to arrive.
Here is a concrete example: a satellite internet connection might offer 100 Mbps of bandwidth — plenty for streaming 4K video. But geostationary satellites orbit at 35,786 km above Earth. A round trip to the satellite and back is over 143,000 km, giving a minimum latency of roughly 600ms. That means every click on a webpage waits at least 600ms before the first byte of the response arrives. Downloading a large file works fine (high bandwidth), but browsing the web feels agonizingly slow (high latency).
This distinction matters for developers. If your API response is 2 KB, upgrading from 10 Mbps to 100 Mbps makes virtually no difference — the payload transfers in under a millisecond either way. But reducing latency from 200ms to 20ms makes the API feel ten times faster. For small, frequent requests — which is what most web applications make — latency dominates the user experience.
What Ping Actually Measures
The ping command is the most basic tool for measuring network latency. It sends an ICMP (Internet Control Message Protocol) Echo Request packet to a target host and waits for an ICMP Echo Reply. The time between sending the request and receiving the reply is the round-trip time (RTT).
$ ping google.com
PING google.com (142.250.80.46): 56 data bytes
64 bytes from 142.250.80.46: icmp_seq=0 ttl=116 time=11.4 ms
64 bytes from 142.250.80.46: icmp_seq=1 ttl=116 time=12.1 ms
64 bytes from 142.250.80.46: icmp_seq=2 ttl=116 time=10.8 ms
64 bytes from 142.250.80.46: icmp_seq=3 ttl=116 time=11.6 ms
--- google.com ping statistics ---
4 packets transmitted, 4 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 10.8/11.5/12.1/0.5 ms
RTT includes the time for the packet to travel to the server, any processing time on the server side, and the time for the reply to travel back. It is the complete round trip — not one-way latency. One-way latency is roughly half the RTT, but not exactly, because network paths are often asymmetric (the route from A to B may differ from B to A).
However, ICMP ping is not always representative of real application performance. Many servers handle ICMP packets differently from HTTP traffic — they may prioritize or deprioritize them. Some firewalls block ICMP entirely. What your application actually experiences is HTTP latency, which includes additional overhead:
- DNS resolution: Translating the hostname to an IP address (typically 10-50ms for a cold lookup).
- TCP handshake: Establishing a TCP connection requires a three-way handshake — one additional round trip.
- TLS handshake: For HTTPS, the TLS negotiation adds one to two more round trips (depending on TLS version).
- Server processing: The time the server spends generating the response.
The metric that captures all of this is Time to First Byte (TTFB) — the time from when the browser sends the HTTP request to when it receives the first byte of the response. TTFB is a much more realistic measure of the latency your users will experience than ICMP ping. A server with a 10ms ping might have a 250ms TTFB if it takes 200ms to query a database and render a response.
What Causes Latency?
Latency is not a single number — it is the sum of delays at every stage of the network path. Understanding where latency comes from is the first step to reducing it.
- Propagation delay (speed of light): The physical speed limit. Light travels through fiber optic cable at about 200,000 km/s. For a server 5,000 km away, propagation alone adds 50ms of round-trip latency. This is the one source of latency you cannot optimize away — you can only reduce the distance.
- Router hops: Every router along the path adds processing delay. Each hop inspects the packet header, looks up the routing table, and forwards the packet to the next hop. A typical internet path has 10-20 hops, each adding 0.5-2ms. Use
tracerouteto see the hops and their individual latencies. - TLS handshakes: Establishing an encrypted connection requires cryptographic key exchange. TLS 1.2 requires two round trips; TLS 1.3 reduces this to one. On a 50ms RTT connection, TLS 1.2 adds 100ms before any application data can flow. TLS session resumption and 0-RTT can eliminate this overhead on subsequent connections.
- DNS resolution: Before any connection can be made, the browser must resolve the hostname to an IP address. A cold DNS lookup can take 20-120ms depending on the resolver and cache state. Subsequent lookups are cached, but the first request to a new domain always pays this cost.
- Server processing time: The time the server spends executing your request — querying databases, running business logic, serializing the response. This is entirely within your control and often the largest source of latency for API endpoints. A slow database query can add hundreds of milliseconds.
- Queuing delay: When a network link or server is congested, packets wait in buffers (queues) before being processed. This is especially noticeable on shared connections and during traffic spikes. Bufferbloat — excessively large network buffers — can add hundreds of milliseconds of queuing delay that does not show up until the link is under load.
Latency Benchmarks — Good, OK, Bad
What counts as "good" latency depends entirely on the use case. A 200ms round trip that is perfectly fine for loading a blog post would be unacceptable for a competitive online game. Here are practical benchmarks for common scenarios:
| Use Case | Good | OK | Bad | Notes |
|---|---|---|---|---|
| Web browsing | < 100 ms | 100 - 300 ms | > 300 ms | TTFB for the initial HTML document |
| Video calls | < 150 ms | 150 - 300 ms | > 300 ms | One-way; above 300ms conversation feels broken |
| Online gaming | < 30 ms | 30 - 80 ms | > 100 ms | RTT; FPS and fighting games are most sensitive |
| Payment / checkout API | < 200 ms | 200 - 500 ms | > 500 ms | Total server response time including processing |
| Internal microservice call | < 5 ms | 5 - 20 ms | > 50 ms | Same data center; higher suggests network issues |
| CDN-served static asset | < 20 ms | 20 - 50 ms | > 100 ms | Cache hit from nearest edge node |
| Database query (application layer) | < 10 ms | 10 - 100 ms | > 200 ms | Simple indexed queries; complex joins will be higher |
These benchmarks assume modern infrastructure. If you are measuring significantly worse numbers, the cause is usually one of: geographic distance to the server, network congestion, an unoptimized server, or a missing CDN layer.
How CDNs Reduce Latency
A Content Delivery Network (CDN) is the most effective way to reduce latency for end users. The principle is simple: instead of serving content from a single origin server that might be 10,000 km from the user, you cache copies of that content on edge servers distributed around the world so that every user is close to at least one copy.
Major CDN providers like Cloudflare, AWS CloudFront, and Fastly operate hundreds of edge locations (Points of Presence, or PoPs) globally. When a user requests a resource, the CDN's DNS routes them to the nearest edge node. If that node has the resource cached (a cache hit), it serves the response directly — no round trip to the origin server at all. The latency is just the distance from the user to the nearest edge node, which is typically under 20ms in most metropolitan areas.
When the edge node does not have the resource (a cache miss), it fetches it from the origin server, serves it to the user, and caches it for future requests. The first user pays the full origin latency, but all subsequent users in that region get the cached version. This is why cache hit ratio is the most important CDN metric — a well-configured CDN should achieve 90-99% cache hit rates for static assets.
CDNs reduce latency in three ways:
- Geographic proximity: The edge node is physically closer to the user, reducing propagation delay. A user in Tokyo hitting a CDN edge in Tokyo gets single-digit millisecond latency, compared to 150ms+ if the origin server is in Virginia.
- Connection reuse: CDN edge nodes maintain persistent connections to your origin server. Instead of a cold TCP + TLS handshake for every request, the edge reuses an existing connection, saving 2-3 round trips.
- Protocol optimization: Modern CDNs use HTTP/2 or HTTP/3 (QUIC) between the edge and the user, enabling multiplexing, header compression, and 0-RTT connection resumption — all of which reduce effective latency.
Browser-Based Ping Limitations
If you have ever tried to build a browser-based ping tool, you have run into a fundamental limitation: browsers cannot send ICMP packets. The ICMP protocol operates at the network layer, and browser JavaScript is sandboxed at the application layer. There is no API — not even a low-level one — for sending raw ICMP packets from a web page.
The workaround is to measure HTTP latency instead. You can use the Fetch API to send a request to a server and time how long it takes:
async function measureLatency(url) {
const start = performance.now();
try {
await fetch(url, { mode: 'no-cors', cache: 'no-store' });
} catch (e) {
// Even failed requests tell us the RTT
}
const end = performance.now();
return Math.round(end - start);
}
// Usage
const latency = await measureLatency('https://example.com/favicon.ico');
console.log(`HTTP latency: ${latency}ms`);
But this approach has several caveats:
- CORS restrictions: By default, browsers enforce the Same-Origin Policy. If the target server does not include the
Access-Control-Allow-Originheader, the browser blocks the response. You can usemode: 'no-cors'to send the request anyway, but you get an opaque response — you cannot read the status code, headers, or body. - Opaque responses: With
no-corsmode, the fetch succeeds (the request is sent and the server responds), but JavaScript cannot inspect the response. You can still measure timing, which is all you need for a latency measurement, but you cannot verify that the server actually responded with a 200 status. - HTTP overhead: HTTP latency includes TCP handshake, TLS negotiation, and HTTP header processing — all of which ICMP ping skips. The first request to a new host will be significantly slower than subsequent ones because of connection setup. Use
keep-aliveconnections and warm up with a throwaway request for more accurate measurements. - Mixed content: If your page is served over HTTPS, you cannot make HTTP (non-secure) requests to measure latency to HTTP-only servers. The browser will block the request entirely.
Despite these limitations, browser-based HTTP latency measurements are practical and useful. They measure the latency that real web applications actually experience, which is arguably more relevant than ICMP ping for most web development use cases.
Reducing API Latency
For backend developers, API latency is the metric your users feel most directly. Here are the most impactful techniques for reducing it:
Keep-alive connections: By default, HTTP/1.1 connections use keep-alive, but many applications accidentally disable it or set timeouts too low. Each new TCP connection requires a three-way handshake (one RTT), and each new TLS session requires a TLS handshake (one to two more RTTs). On a 50ms RTT connection, connection setup alone adds 100-150ms. Reusing connections eliminates this entirely.
# Check if your server supports keep-alive
$ curl -v -o /dev/null https://api.example.com/health 2>&1 | grep -i "keep-alive"
< Connection: keep-alive
< Keep-Alive: timeout=30
HTTP/2 multiplexing: HTTP/1.1 allows only one request per connection at a time (without pipelining, which is effectively dead). If your page needs 20 resources, the browser opens 6 parallel connections and serializes requests across them. HTTP/2 multiplexes all requests over a single connection, eliminating head-of-line blocking and the overhead of multiple connection setups. Ensure your server and CDN support HTTP/2 — most do by default in 2026.
Prefetching and preconnecting: If you know which domains your page will contact, tell the browser to start the connection early:
<!-- DNS prefetch — resolve the domain name ahead of time -->
<link rel="dns-prefetch" href="https://api.example.com" />
<!-- Preconnect — do DNS + TCP + TLS handshake ahead of time -->
<link rel="preconnect" href="https://api.example.com" />
<!-- Prefetch — download a resource the user is likely to need next -->
<link rel="prefetch" href="/api/user/profile" />
Edge deployment: Deploy your API servers or serverless functions at the edge — close to your users. Platforms like Cloudflare Workers, Deno Deploy, and AWS Lambda@Edge run your code in data centers worldwide. Instead of all API requests routing to a single region, each request is handled by the nearest edge node. This can reduce API latency from 200ms to under 30ms for globally distributed users.
Response compression: While compression does not reduce network RTT, it reduces the amount of data transmitted, which shortens the transfer phase. Enable gzip or Brotli compression for JSON API responses. A typical JSON payload compresses to 20-30% of its original size, which can shave significant time off large responses on slower connections.
Database proximity: Your API is only as fast as your slowest dependency. If your API server is in US-East but your database is in EU-West, every database query adds 80ms+ of cross-Atlantic latency. Co-locate your application servers and databases in the same region, or use read replicas in regions where you have edge deployments.