System Design Fundamentals

11 items

Fundamentalslatencythroughputperformancepercentileslittles-lawfundamentalssystem-designbeginner

Latency, Throughput & Performance

The numbers every engineer should know - from nanoseconds to network round trips

Foundation knowledge|30 min read

Summary

Latency is how long one operation takes. Throughput is how many operations complete per unit time. These metrics are related but independent—you can optimize one at the expense of the other. Understanding latency percentiles (P50, P95, P99) reveals what users actually experience, not just averages. Every system has latency budgets dictated by physics (speed of light) and practical limits (disk seeks, network hops). Knowing these numbers lets you reason about system design before writing code.

Key Takeaways

Averages Lie, Percentiles Tell Truth

If average latency is 50ms but P99 is 2 seconds, 1 in 100 users waits 40x longer. Averages hide outliers. Always measure P50 (median), P95, P99, and P99.9 to understand real user experience.

Latency Compounds in Distributed Systems

If Service A calls B, C, and D each with 100ms P99, the combined P99 isn't 100ms—it's much higher. With fan-out, the slowest dependency dominates. This is why microservices often have worse tail latency than monoliths.

Memory is 1000x Faster Than SSD, SSD is 1000x Faster Than Network

L1 cache: 1ns. RAM: 100ns. SSD: 100μs. Network round-trip: 500μs-150ms. These ratios don't change—they're physics. Design decisions should respect this hierarchy.

Latency measures the time for a single operation to complete—from request sent to response received.

Throughput measures how many operations complete in a given time period.

Think of a highway: - Latency: How long to drive from A to B (time per car) - Throughput: How many cars pass a point per hour (cars per time)

A highway can have low latency (fast speed limit) but low throughput (one lane). Or high throughput (six lanes) but high latency (traffic congestion). The two are related but not the same.

Latency vs Throughput Visualization

Units: - Latency: milliseconds (ms), microseconds (μs), nanoseconds (ns) - Throughput: requests per second (RPS/QPS), transactions per second (TPS), megabytes per second (MB/s)

The relationship:

Throughput = Concurrency / Latency

With 100ms latency and 10 concurrent workers:

Throughput = 10 / 0.1s = 100 requests/second

To increase throughput, you can either reduce latency OR increase concurrency.

Summary

Key Takeaways

Averages Lie, Percentiles Tell Truth

If average latency is 50ms but P99 is 2 seconds, 1 in 100 users waits 40x longer. Averages hide outliers. Always measure P50 (median), P95, P99, and P99.9 to understand real user experience.

Latency Compounds in Distributed Systems

Memory is 1000x Faster Than SSD, SSD is 1000x Faster Than Network

L1 cache: 1ns. RAM: 100ns. SSD: 100μs. Network round-trip: 500μs-150ms. These ratios don't change—they're physics. Design decisions should respect this hierarchy.

Little's Law Connects Latency and Throughput

Concurrent requests = Throughput × Latency. If your service handles 1000 req/s with 100ms latency, you have 100 concurrent requests at any moment. This formula is essential for capacity planning.

Tail Latency Amplification is Multiplicative

A request fanning out to 100 backends has a high chance of hitting at least one slow backend. If P99 is 100ms and you call 100 servers, ~63% of requests will experience the slow path. This is unavoidable math.

Latency Has Hard Limits

Speed of light in fiber: 200km/ms. Cross-US round trip: minimum 40ms. Cross-Pacific: 100ms. No amount of engineering overcomes physics—you must design around it with caching, CDNs, and regional deployment.

Deep Dive