The Invisible Tax on Every Network Request

Every time your browser loads a page, your phone syncs data, or your backend calls an API, there is a delay. The request has to travel across cables, through routers, up into data centers, get processed, and come back. That delay is latency — and it is the single biggest factor determining whether your application feels fast or sluggish.

Latency is measured in milliseconds, but those milliseconds add up. A page that makes 40 requests to a server 150ms away spends six full seconds just waiting for data to travel back and forth — before your server even starts processing anything. For real-time applications like video calls, online games, or financial trading, every millisecond of latency directly impacts the user experience or the bottom line.

Unlike bandwidth, which you can throw money at to increase, latency is constrained by physics. Light in a fiber optic cable travels at roughly 200,000 kilometers per second — about two-thirds the speed of light in a vacuum. A round trip from New York to London (11,000 km) takes at least 55ms just for the signal to travel the cable, and that is the absolute theoretical minimum with zero processing overhead.

Latency vs Bandwidth — The Common Confusion

Latency and bandwidth are the two fundamental dimensions of network performance, but people conflate them constantly. When someone says their internet is "slow," they almost always mean low bandwidth — but the actual problem is often high latency.

Bandwidth is the capacity of the pipe. It measures how much data can flow through a connection per second, expressed in megabits per second (Mbps) or gigabits per second (Gbps). Think of bandwidth as the number of lanes on a highway. A six-lane highway can move more cars per hour than a two-lane road.

Latency is the speed of each individual trip. It measures how long it takes for a single packet to get from point A to point B. Think of latency as the speed limit on that highway. Even if you have a 12-lane highway, if the speed limit is 20 mph, each individual car still takes a long time to arrive.

Here is a concrete example: a satellite internet connection might offer 100 Mbps of bandwidth — plenty for streaming 4K video. But geostationary satellites orbit at 35,786 km above Earth. A round trip to the satellite and back is over 143,000 km, giving a minimum latency of roughly 600ms. That means every click on a webpage waits at least 600ms before the first byte of the response arrives. Downloading a large file works fine (high bandwidth), but browsing the web feels agonizingly slow (high latency).

This distinction matters for developers. If your API response is 2 KB, upgrading from 10 Mbps to 100 Mbps makes virtually no difference — the payload transfers in under a millisecond either way. But reducing latency from 200ms to 20ms makes the API feel ten times faster. For small, frequent requests — which is what most web applications make — latency dominates the user experience.

What Ping Actually Measures

The ping command is the most basic tool for measuring network latency. It sends an ICMP (Internet Control Message Protocol) Echo Request packet to a target host and waits for an ICMP Echo Reply. The time between sending the request and receiving the reply is the round-trip time (RTT).

$ ping google.com
PING google.com (142.250.80.46): 56 data bytes
64 bytes from 142.250.80.46: icmp_seq=0 ttl=116 time=11.4 ms
64 bytes from 142.250.80.46: icmp_seq=1 ttl=116 time=12.1 ms
64 bytes from 142.250.80.46: icmp_seq=2 ttl=116 time=10.8 ms
64 bytes from 142.250.80.46: icmp_seq=3 ttl=116 time=11.6 ms

--- google.com ping statistics ---
4 packets transmitted, 4 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 10.8/11.5/12.1/0.5 ms

RTT includes the time for the packet to travel to the server, any processing time on the server side, and the time for the reply to travel back. It is the complete round trip — not one-way latency. One-way latency is roughly half the RTT, but not exactly, because network paths are often asymmetric (the route from A to B may differ from B to A).

However, ICMP ping is not always representative of real application performance. Many servers handle ICMP packets differently from HTTP traffic — they may prioritize or deprioritize them. Some firewalls block ICMP entirely. What your application actually experiences is HTTP latency, which includes additional overhead:

The metric that captures all of this is Time to First Byte (TTFB) — the time from when the browser sends the HTTP request to when it receives the first byte of the response. TTFB is a much more realistic measure of the latency your users will experience than ICMP ping. A server with a 10ms ping might have a 250ms TTFB if it takes 200ms to query a database and render a response.

What Causes Latency?

Latency is not a single number — it is the sum of delays at every stage of the network path. Understanding where latency comes from is the first step to reducing it.

Latency Benchmarks — Good, OK, Bad

What counts as "good" latency depends entirely on the use case. A 200ms round trip that is perfectly fine for loading a blog post would be unacceptable for a competitive online game. Here are practical benchmarks for common scenarios:

Use Case Good OK Bad Notes
Web browsing < 100 ms 100 - 300 ms > 300 ms TTFB for the initial HTML document
Video calls < 150 ms 150 - 300 ms > 300 ms One-way; above 300ms conversation feels broken
Online gaming < 30 ms 30 - 80 ms > 100 ms RTT; FPS and fighting games are most sensitive
Payment / checkout API < 200 ms 200 - 500 ms > 500 ms Total server response time including processing
Internal microservice call < 5 ms 5 - 20 ms > 50 ms Same data center; higher suggests network issues
CDN-served static asset < 20 ms 20 - 50 ms > 100 ms Cache hit from nearest edge node
Database query (application layer) < 10 ms 10 - 100 ms > 200 ms Simple indexed queries; complex joins will be higher

These benchmarks assume modern infrastructure. If you are measuring significantly worse numbers, the cause is usually one of: geographic distance to the server, network congestion, an unoptimized server, or a missing CDN layer.

How CDNs Reduce Latency

A Content Delivery Network (CDN) is the most effective way to reduce latency for end users. The principle is simple: instead of serving content from a single origin server that might be 10,000 km from the user, you cache copies of that content on edge servers distributed around the world so that every user is close to at least one copy.

Major CDN providers like Cloudflare, AWS CloudFront, and Fastly operate hundreds of edge locations (Points of Presence, or PoPs) globally. When a user requests a resource, the CDN's DNS routes them to the nearest edge node. If that node has the resource cached (a cache hit), it serves the response directly — no round trip to the origin server at all. The latency is just the distance from the user to the nearest edge node, which is typically under 20ms in most metropolitan areas.

When the edge node does not have the resource (a cache miss), it fetches it from the origin server, serves it to the user, and caches it for future requests. The first user pays the full origin latency, but all subsequent users in that region get the cached version. This is why cache hit ratio is the most important CDN metric — a well-configured CDN should achieve 90-99% cache hit rates for static assets.

CDNs reduce latency in three ways:

Browser-Based Ping Limitations

If you have ever tried to build a browser-based ping tool, you have run into a fundamental limitation: browsers cannot send ICMP packets. The ICMP protocol operates at the network layer, and browser JavaScript is sandboxed at the application layer. There is no API — not even a low-level one — for sending raw ICMP packets from a web page.

The workaround is to measure HTTP latency instead. You can use the Fetch API to send a request to a server and time how long it takes:

async function measureLatency(url) {
  const start = performance.now();
  try {
    await fetch(url, { mode: 'no-cors', cache: 'no-store' });
  } catch (e) {
    // Even failed requests tell us the RTT
  }
  const end = performance.now();
  return Math.round(end - start);
}

// Usage
const latency = await measureLatency('https://example.com/favicon.ico');
console.log(`HTTP latency: ${latency}ms`);

But this approach has several caveats:

Despite these limitations, browser-based HTTP latency measurements are practical and useful. They measure the latency that real web applications actually experience, which is arguably more relevant than ICMP ping for most web development use cases.

Reducing API Latency

For backend developers, API latency is the metric your users feel most directly. Here are the most impactful techniques for reducing it:

Keep-alive connections: By default, HTTP/1.1 connections use keep-alive, but many applications accidentally disable it or set timeouts too low. Each new TCP connection requires a three-way handshake (one RTT), and each new TLS session requires a TLS handshake (one to two more RTTs). On a 50ms RTT connection, connection setup alone adds 100-150ms. Reusing connections eliminates this entirely.

# Check if your server supports keep-alive
$ curl -v -o /dev/null https://api.example.com/health 2>&1 | grep -i "keep-alive"
< Connection: keep-alive
< Keep-Alive: timeout=30

HTTP/2 multiplexing: HTTP/1.1 allows only one request per connection at a time (without pipelining, which is effectively dead). If your page needs 20 resources, the browser opens 6 parallel connections and serializes requests across them. HTTP/2 multiplexes all requests over a single connection, eliminating head-of-line blocking and the overhead of multiple connection setups. Ensure your server and CDN support HTTP/2 — most do by default in 2026.

Prefetching and preconnecting: If you know which domains your page will contact, tell the browser to start the connection early:

<!-- DNS prefetch — resolve the domain name ahead of time -->
<link rel="dns-prefetch" href="https://api.example.com" />

<!-- Preconnect — do DNS + TCP + TLS handshake ahead of time -->
<link rel="preconnect" href="https://api.example.com" />

<!-- Prefetch — download a resource the user is likely to need next -->
<link rel="prefetch" href="/api/user/profile" />

Edge deployment: Deploy your API servers or serverless functions at the edge — close to your users. Platforms like Cloudflare Workers, Deno Deploy, and AWS Lambda@Edge run your code in data centers worldwide. Instead of all API requests routing to a single region, each request is handled by the nearest edge node. This can reduce API latency from 200ms to under 30ms for globally distributed users.

Response compression: While compression does not reduce network RTT, it reduces the amount of data transmitted, which shortens the transfer phase. Enable gzip or Brotli compression for JSON API responses. A typical JSON payload compresses to 20-30% of its original size, which can shave significant time off large responses on slower connections.

Database proximity: Your API is only as fast as your slowest dependency. If your API server is in US-East but your database is in EU-West, every database query adds 80ms+ of cross-Atlantic latency. Co-locate your application servers and databases in the same region, or use read replicas in regions where you have edge deployments.