SystemExpertsSystemExperts
Pricing

System Design Fundamentals

11 items

Scalability Fundamentals

25mbeginner

Latency, Throughput & Performance

30mbeginner

Back-of-Envelope Calculations

25mbeginner

Availability & Reliability Fundamentals

35mintermediate

CAP Theorem & Consistency Models

40mintermediate

Load Balancing Deep Dive

35mintermediate

Asynchronous Processing & Message Queues

30mintermediate

Networking & Protocols

30mintermediate

Caching Strategies

35mintermediate

System Design Fundamentals

20mintermediate

Database Fundamentals

40madvanced
Fundamentalslatencythroughputperformancepercentileslittles-lawfundamentalssystem-designbeginner

Latency, Throughput & Performance

The numbers every engineer should know - from nanoseconds to network round trips

Foundation knowledge|30 min read

Summary

Latency is how long one operation takes. Throughput is how many operations complete per unit time. These metrics are related but independent—you can optimize one at the expense of the other. Understanding latency percentiles (P50, P95, P99) reveals what users actually experience, not just averages. Every system has latency budgets dictated by physics (speed of light) and practical limits (disk seeks, network hops). Knowing these numbers lets you reason about system design before writing code.

Key Takeaways

Averages Lie, Percentiles Tell Truth

If average latency is 50ms but P99 is 2 seconds, 1 in 100 users waits 40x longer. Averages hide outliers. Always measure P50 (median), P95, P99, and P99.9 to understand real user experience.

Latency Compounds in Distributed Systems

If Service A calls B, C, and D each with 100ms P99, the combined P99 isn't 100ms—it's much higher. With fan-out, the slowest dependency dominates. This is why microservices often have worse tail latency than monoliths.

Memory is 1000x Faster Than SSD, SSD is 1000x Faster Than Network

L1 cache: 1ns. RAM: 100ns. SSD: 100μs. Network round-trip: 500μs-150ms. These ratios don't change—they're physics. Design decisions should respect this hierarchy.

Latency measures the time for a single operation to complete—from request sent to response received.

Throughput measures how many operations complete in a given time period.

Think of a highway: - Latency: How long to drive from A to B (time per car) - Throughput: How many cars pass a point per hour (cars per time)

A highway can have low latency (fast speed limit) but low throughput (one lane). Or high throughput (six lanes) but high latency (traffic congestion). The two are related but not the same.

Latency vs Throughput Visualization

Units: - Latency: milliseconds (ms), microseconds (μs), nanoseconds (ns) - Throughput: requests per second (RPS/QPS), transactions per second (TPS), megabytes per second (MB/s)

The relationship:

Throughput = Concurrency / Latency

With 100ms latency and 10 concurrent workers:

Throughput = 10 / 0.1s = 100 requests/second

To increase throughput, you can either reduce latency OR increase concurrency.

Summary

Latency is how long one operation takes. Throughput is how many operations complete per unit time. These metrics are related but independent—you can optimize one at the expense of the other. Understanding latency percentiles (P50, P95, P99) reveals what users actually experience, not just averages. Every system has latency budgets dictated by physics (speed of light) and practical limits (disk seeks, network hops). Knowing these numbers lets you reason about system design before writing code.

Key Takeaways

Averages Lie, Percentiles Tell Truth

If average latency is 50ms but P99 is 2 seconds, 1 in 100 users waits 40x longer. Averages hide outliers. Always measure P50 (median), P95, P99, and P99.9 to understand real user experience.

Latency Compounds in Distributed Systems

If Service A calls B, C, and D each with 100ms P99, the combined P99 isn't 100ms—it's much higher. With fan-out, the slowest dependency dominates. This is why microservices often have worse tail latency than monoliths.

Memory is 1000x Faster Than SSD, SSD is 1000x Faster Than Network

L1 cache: 1ns. RAM: 100ns. SSD: 100μs. Network round-trip: 500μs-150ms. These ratios don't change—they're physics. Design decisions should respect this hierarchy.

Little's Law Connects Latency and Throughput

Concurrent requests = Throughput × Latency. If your service handles 1000 req/s with 100ms latency, you have 100 concurrent requests at any moment. This formula is essential for capacity planning.

Tail Latency Amplification is Multiplicative

A request fanning out to 100 backends has a high chance of hitting at least one slow backend. If P99 is 100ms and you call 100 servers, ~63% of requests will experience the slow path. This is unavoidable math.

Latency Has Hard Limits

Speed of light in fiber: 200km/ms. Cross-US round trip: minimum 40ms. Cross-Pacific: 100ms. No amount of engineering overcomes physics—you must design around it with caching, CDNs, and regional deployment.

Deep Dive

Latency measures the time for a single operation to complete—from request sent to response received.

Throughput measures how many operations complete in a given time period.

Think of a highway: - Latency: How long to drive from A to B (time per car) - Throughput: How many cars pass a point per hour (cars per time)

A highway can have low latency (fast speed limit) but low throughput (one lane). Or high throughput (six lanes) but high latency (traffic congestion). The two are related but not the same.

Latency vs Throughput Visualization

Units: - Latency: milliseconds (ms), microseconds (μs), nanoseconds (ns) - Throughput: requests per second (RPS/QPS), transactions per second (TPS), megabytes per second (MB/s)

The relationship:

Throughput = Concurrency / Latency

With 100ms latency and 10 concurrent workers:

Throughput = 10 / 0.1s = 100 requests/second

To increase throughput, you can either reduce latency OR increase concurrency.

Trade-offs

AspectAdvantageDisadvantage
Percentiles vs AveragesPercentiles reveal real user experience, catch outliers, meaningful for SLAsHarder to aggregate across servers, more expensive to compute, requires histogram storage
Hedged RequestsDramatically reduces tail latency, simple to implement, works with existing infrastructureIncreases backend load by 2-5%, requires idempotent operations, complicates debugging
Aggressive TimeoutsPrevents cascade failures, enforces latency budgets, improves user experienceMay abort operations that would have succeeded, requires good fallback logic, tuning is tricky
Multi-Region DeploymentReduces latency for global users, provides disaster recovery, required for <100ms global latencyDramatically increases operational complexity, consistency challenges, higher costs

Premium Content

Sign in to access this content or upgrade for full access.