System Design Fundamentals
11 items
11 items
The numbers every engineer should know - from nanoseconds to network round trips
Latency is how long one operation takes. Throughput is how many operations complete per unit time. These metrics are related but independent—you can optimize one at the expense of the other. Understanding latency percentiles (P50, P95, P99) reveals what users actually experience, not just averages. Every system has latency budgets dictated by physics (speed of light) and practical limits (disk seeks, network hops). Knowing these numbers lets you reason about system design before writing code.
If average latency is 50ms but P99 is 2 seconds, 1 in 100 users waits 40x longer. Averages hide outliers. Always measure P50 (median), P95, P99, and P99.9 to understand real user experience.
If Service A calls B, C, and D each with 100ms P99, the combined P99 isn't 100ms—it's much higher. With fan-out, the slowest dependency dominates. This is why microservices often have worse tail latency than monoliths.
L1 cache: 1ns. RAM: 100ns. SSD: 100μs. Network round-trip: 500μs-150ms. These ratios don't change—they're physics. Design decisions should respect this hierarchy.
Latency measures the time for a single operation to complete—from request sent to response received.
Throughput measures how many operations complete in a given time period.
Think of a highway: - Latency: How long to drive from A to B (time per car) - Throughput: How many cars pass a point per hour (cars per time)
A highway can have low latency (fast speed limit) but low throughput (one lane). Or high throughput (six lanes) but high latency (traffic congestion). The two are related but not the same.
Units: - Latency: milliseconds (ms), microseconds (μs), nanoseconds (ns) - Throughput: requests per second (RPS/QPS), transactions per second (TPS), megabytes per second (MB/s)
The relationship:
Throughput = Concurrency / LatencyWith 100ms latency and 10 concurrent workers:
Throughput = 10 / 0.1s = 100 requests/secondTo increase throughput, you can either reduce latency OR increase concurrency.