System Design Fundamentals

11 items

Fundamentalsload-balancinghaproxynginxl4l7consistent-hashingfundamentalssystem-designintermediate

Load Balancing Deep Dive

From round-robin to consistent hashing - distributing traffic at every layer

Foundation knowledge|35 min read

Summary

Load balancing distributes incoming traffic across multiple servers to improve availability, throughput, and response time. It operates at different layers: L4 (transport) makes decisions based on IP/port, while L7 (application) can route based on HTTP headers, URLs, or content. The choice of algorithm matters—round-robin is simple but ignores server load, while least-connections adapts to actual capacity. For stateful applications, sticky sessions or external session stores solve the affinity problem. At global scale, GeoDNS and Anycast route users to nearby data centers.

Key Takeaways

L4 is Fast, L7 is Smart

Layer 4 load balancers see only IP addresses and ports—they're fast (millions of connections/second) but blind to application logic. Layer 7 balancers understand HTTP, can route by URL or header, terminate SSL, and compress responses—but with higher CPU cost per request.

Least-Connections Beats Round-Robin for Variable Workloads

Round-robin assumes all requests are equal. If one request takes 10ms and another takes 1000ms, round-robin creates imbalance. Least-connections sends new requests to servers with fewest active connections, naturally adapting to request complexity.

Consistent Hashing Minimizes Redistribution

When adding or removing servers, simple hash-based routing (hash % N) reshuffles most requests. Consistent hashing only moves K/N keys (where K is total keys, N is servers). This is essential for caches and stateful services.

Load balancing distributes incoming requests across multiple servers. It serves three purposes:

Availability: If one server fails, others handle traffic
Scalability: Add servers to handle more load
Performance: Prevent any single server from being overwhelmed

Load balancers act as a reverse proxy—clients talk to the balancer, which forwards requests to backend servers. Clients are unaware of the backend topology.

Basic Load Balancer Architecture

Where load balancers sit:

| Layer | Between | Examples | |-------|---------|----------| | Edge | Internet → Data center | Cloudflare, AWS ALB | | Internal | Service → Service | HAProxy, Envoy | | Database | App → DB replicas | ProxySQL, PgBouncer | | DNS | User → Data center | Route53, Cloudflare DNS |

Load balancers can be hardware appliances (F5, Citrix), software (HAProxy, Nginx), or cloud services (AWS ALB/NLB, GCP Load Balancer).

Summary

Key Takeaways

L4 is Fast, L7 is Smart

Least-Connections Beats Round-Robin for Variable Workloads

Consistent Hashing Minimizes Redistribution

Sticky Sessions are a Crutch, Not a Solution

Sticky sessions route users to the same server, working around stateful applications. But they create hotspots, complicate failover, and don't survive server restarts. The real fix is externalizing state to a shared store.

Health Checks Must Test What Matters

A TCP port check confirms the process is running, not that it's functional. HTTP health checks should test database connectivity, cache availability, and downstream dependencies—or you'll route traffic to broken servers.

Global Load Balancing is About Latency, Not Just Failover

GeoDNS and Anycast don't just provide disaster recovery—they reduce latency by routing users to nearby data centers. A 100ms latency improvement (US user hitting US server vs Europe) directly impacts user experience and conversion.

Deep Dive

Load balancing distributes incoming requests across multiple servers. It serves three purposes:

Availability: If one server fails, others handle traffic
Scalability: Add servers to handle more load
Performance: Prevent any single server from being overwhelmed

Load balancers act as a reverse proxy—clients talk to the balancer, which forwards requests to backend servers. Clients are unaware of the backend topology.

Basic Load Balancer Architecture

Where load balancers sit:

Load balancers can be hardware appliances (F5, Citrix), software (HAProxy, Nginx), or cloud services (AWS ALB/NLB, GCP Load Balancer).

Trade-offs

Aspect	Advantage	Disadvantage
L4 vs L7	L4 is faster (millions conn/s) and simpler; L7 provides content-based routing, SSL termination, and application intelligence	L4 cannot route by URL/header or inspect traffic; L7 has higher CPU overhead and latency
Sticky Sessions	Enables stateful applications without external session store, simple to configure	Creates hotspots, complicates failover, prevents free scaling; better to externalize state
SSL Termination at LB	Offloads CPU from backends, centralizes certificate management, enables L7 features	Internal traffic unencrypted (unless re-encrypted), potential compliance issues
Managed vs Self-Hosted LB	Managed (ALB/NLB): No ops overhead, built-in HA, auto-scaling. Self-hosted: Full control, no vendor lock-in	Managed: Less customization, cloud-specific. Self-hosted: Operational burden, must build HA yourself

System Design Fundamentals

Scalability Fundamentals

Latency, Throughput & Performance

Back-of-Envelope Calculations

Availability & Reliability Fundamentals

CAP Theorem & Consistency Models