SystemExpertsSystemExperts
Pricing

System Design Fundamentals

11 items

Scalability Fundamentals

25mbeginner

Latency, Throughput & Performance

30mbeginner

Back-of-Envelope Calculations

25mbeginner

Availability & Reliability Fundamentals

35mintermediate

CAP Theorem & Consistency Models

40mintermediate

Load Balancing Deep Dive

35mintermediate

Asynchronous Processing & Message Queues

30mintermediate

Networking & Protocols

30mintermediate

Caching Strategies

35mintermediate

System Design Fundamentals

20mintermediate

Database Fundamentals

40madvanced
Fundamentalsload-balancinghaproxynginxl4l7consistent-hashingfundamentalssystem-designintermediate

Load Balancing Deep Dive

From round-robin to consistent hashing - distributing traffic at every layer

Foundation knowledge|35 min read

Summary

Load balancing distributes incoming traffic across multiple servers to improve availability, throughput, and response time. It operates at different layers: L4 (transport) makes decisions based on IP/port, while L7 (application) can route based on HTTP headers, URLs, or content. The choice of algorithm matters—round-robin is simple but ignores server load, while least-connections adapts to actual capacity. For stateful applications, sticky sessions or external session stores solve the affinity problem. At global scale, GeoDNS and Anycast route users to nearby data centers.

Key Takeaways

L4 is Fast, L7 is Smart

Layer 4 load balancers see only IP addresses and ports—they're fast (millions of connections/second) but blind to application logic. Layer 7 balancers understand HTTP, can route by URL or header, terminate SSL, and compress responses—but with higher CPU cost per request.

Least-Connections Beats Round-Robin for Variable Workloads

Round-robin assumes all requests are equal. If one request takes 10ms and another takes 1000ms, round-robin creates imbalance. Least-connections sends new requests to servers with fewest active connections, naturally adapting to request complexity.

Consistent Hashing Minimizes Redistribution

When adding or removing servers, simple hash-based routing (hash % N) reshuffles most requests. Consistent hashing only moves K/N keys (where K is total keys, N is servers). This is essential for caches and stateful services.

Load balancing distributes incoming requests across multiple servers. It serves three purposes:

  1. Availability: If one server fails, others handle traffic
  2. Scalability: Add servers to handle more load
  3. Performance: Prevent any single server from being overwhelmed

Load balancers act as a reverse proxy—clients talk to the balancer, which forwards requests to backend servers. Clients are unaware of the backend topology.

Basic Load Balancer Architecture

Where load balancers sit:

| Layer | Between | Examples | |-------|---------|----------| | Edge | Internet → Data center | Cloudflare, AWS ALB | | Internal | Service → Service | HAProxy, Envoy | | Database | App → DB replicas | ProxySQL, PgBouncer | | DNS | User → Data center | Route53, Cloudflare DNS |

Load balancers can be hardware appliances (F5, Citrix), software (HAProxy, Nginx), or cloud services (AWS ALB/NLB, GCP Load Balancer).

Summary

Load balancing distributes incoming traffic across multiple servers to improve availability, throughput, and response time. It operates at different layers: L4 (transport) makes decisions based on IP/port, while L7 (application) can route based on HTTP headers, URLs, or content. The choice of algorithm matters—round-robin is simple but ignores server load, while least-connections adapts to actual capacity. For stateful applications, sticky sessions or external session stores solve the affinity problem. At global scale, GeoDNS and Anycast route users to nearby data centers.

Key Takeaways

L4 is Fast, L7 is Smart

Layer 4 load balancers see only IP addresses and ports—they're fast (millions of connections/second) but blind to application logic. Layer 7 balancers understand HTTP, can route by URL or header, terminate SSL, and compress responses—but with higher CPU cost per request.

Least-Connections Beats Round-Robin for Variable Workloads

Round-robin assumes all requests are equal. If one request takes 10ms and another takes 1000ms, round-robin creates imbalance. Least-connections sends new requests to servers with fewest active connections, naturally adapting to request complexity.

Consistent Hashing Minimizes Redistribution

When adding or removing servers, simple hash-based routing (hash % N) reshuffles most requests. Consistent hashing only moves K/N keys (where K is total keys, N is servers). This is essential for caches and stateful services.

Sticky Sessions are a Crutch, Not a Solution

Sticky sessions route users to the same server, working around stateful applications. But they create hotspots, complicate failover, and don't survive server restarts. The real fix is externalizing state to a shared store.

Health Checks Must Test What Matters

A TCP port check confirms the process is running, not that it's functional. HTTP health checks should test database connectivity, cache availability, and downstream dependencies—or you'll route traffic to broken servers.

Global Load Balancing is About Latency, Not Just Failover

GeoDNS and Anycast don't just provide disaster recovery—they reduce latency by routing users to nearby data centers. A 100ms latency improvement (US user hitting US server vs Europe) directly impacts user experience and conversion.

Deep Dive

Load balancing distributes incoming requests across multiple servers. It serves three purposes:

  1. Availability: If one server fails, others handle traffic
  2. Scalability: Add servers to handle more load
  3. Performance: Prevent any single server from being overwhelmed

Load balancers act as a reverse proxy—clients talk to the balancer, which forwards requests to backend servers. Clients are unaware of the backend topology.

Basic Load Balancer Architecture

Where load balancers sit:

| Layer | Between | Examples | |-------|---------|----------| | Edge | Internet → Data center | Cloudflare, AWS ALB | | Internal | Service → Service | HAProxy, Envoy | | Database | App → DB replicas | ProxySQL, PgBouncer | | DNS | User → Data center | Route53, Cloudflare DNS |

Load balancers can be hardware appliances (F5, Citrix), software (HAProxy, Nginx), or cloud services (AWS ALB/NLB, GCP Load Balancer).

Trade-offs

AspectAdvantageDisadvantage
L4 vs L7L4 is faster (millions conn/s) and simpler; L7 provides content-based routing, SSL termination, and application intelligenceL4 cannot route by URL/header or inspect traffic; L7 has higher CPU overhead and latency
Sticky SessionsEnables stateful applications without external session store, simple to configureCreates hotspots, complicates failover, prevents free scaling; better to externalize state
SSL Termination at LBOffloads CPU from backends, centralizes certificate management, enables L7 featuresInternal traffic unencrypted (unless re-encrypted), potential compliance issues
Managed vs Self-Hosted LBManaged (ALB/NLB): No ops overhead, built-in HA, auto-scaling. Self-hosted: Full control, no vendor lock-inManaged: Less customization, cloud-specific. Self-hosted: Operational burden, must build HA yourself

Premium Content

Sign in to access this content or upgrade for full access.