Whitepapers

15 items

MapReduce: Simplified Data Processing on Large Clusters

30mintermediate

Kafka: A Distributed Messaging System for Log Processing

30mintermediate

Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services

25mintermediate

cachingdistributed-systemsfacebookmemcachedscalabilityconsistencyperformanceadvanced

Scaling Memcache at Facebook

How Facebook transformed a simple cache into a distributed system serving billions of requests per second across global datacenters

Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung, Venkateshwaran Venkataramani|Facebook|2013|35 min read

View Original Paper

Summary

Facebook scaled memcached from a single-server cache to a globally distributed system handling billions of requests per second. The paper reveals hard-won lessons: how to reduce latency through batching and parallelism, how to handle thundering herds with leases, how to maintain consistency across regions with invalidation protocols, and how to operate at a scale where even rare edge cases happen constantly. This isn't theoretical—it's battle-tested engineering that powers one of the world's largest websites.

Key Takeaways

Demand-Filled Look-Aside Cache

Memcache acts as a demand-filled look-aside cache: the application checks memcache first, falls back to database on miss, then populates cache. This simple pattern scales because reads vastly outnumber writes and cache hits avoid expensive database queries.

Leases Prevent Thundering Herds

When a cached item expires, hundreds of concurrent requests can simultaneously miss and hit the database (thundering herd). Leases give one client exclusive right to populate the cache while others wait, reducing database load from N to 1.

Mcrouter for Fan-Out Reduction

Web servers don't talk directly to memcache servers. Mcrouter, a proxy layer, batches requests, handles routing, and manages connection pooling—reducing millions of TCP connections to thousands and enabling efficient fan-out.

By 2013, Facebook served over a billion users with a read-heavy workload. A single page load could trigger hundreds of data fetches—user profile, friend list, news feed items, photos, comments, likes. Hitting the database for every request was impossible.

The numbers: - Billions of requests per second to memcache - Trillions of items cached - Thousands of memcache servers - Multiple datacenters across continents

Memcached, a simple in-memory key-value store, became the backbone. But scaling it required solving problems the original developers never imagined: reducing latency at massive fan-out, handling cache invalidation across regions, and operating reliably when "rare" events happen thousands of times per second.

Summary

Key Takeaways

Demand-Filled Look-Aside Cache

Leases Prevent Thundering Herds

Mcrouter for Fan-Out Reduction

Regional Invalidation via McSqueal

Instead of writing to cache, database updates trigger invalidations. McSqueal tails the MySQL commit log, extracts invalidation keys, and broadcasts deletes to all memcache servers in the region—ensuring consistency without complex cache-update logic.

Cross-Region Consistency with Remote Markers

With master database in one region and replicas in others, replication lag can cause stale reads. Remote markers indicate data is being modified, telling the local region to query the master until replication catches up.

Cold Cluster Warmup

Bringing up a new cluster with empty caches would crush the database. Cold cluster warmup lets new clusters fetch from warm clusters, gradually shifting traffic as hit rate improves.

Deep Dive

The numbers: - Billions of requests per second to memcache - Trillions of items cached - Thousands of memcache servers - Multiple datacenters across continents

Trade-offs

Aspect	Advantage	Disadvantage
Look-Aside vs Write-Through	Application controls caching logic; simpler cache implementation; flexibility in what/how to cache	Application complexity increases; cache population happens on miss (cold start latency)
Delete on Write	Simple and idempotent; avoids race conditions; no complex serialization in write path	Next read pays cache-miss penalty; not suitable for write-heavy workloads
Leases	Prevents thundering herds; reduces database load during hot key expiration	Adds latency for non-lease-holders; complexity in handling lease expiration and failures
Regional Replication	Lower read latency for users worldwide; better availability during region failures	Complex consistency semantics; replication lag causes stale reads without mitigations
Gutter Pools	Graceful degradation during server failures; prevents cascade to database	Additional infrastructure; short TTLs mean temporary solution; may serve stale data

Premium Content

cachingdistributed-systemsfacebookmemcachedscalabilityconsistencyperformanceadvanced

Scaling Memcache at Facebook

How Facebook transformed a simple cache into a distributed system serving billions of requests per second across global datacenters

View Original Paper

Summary

Key Takeaways

Demand-Filled Look-Aside Cache

Leases Prevent Thundering Herds

Mcrouter for Fan-Out Reduction

The numbers: - Billions of requests per second to memcache - Trillions of items cached - Thousands of memcache servers - Multiple datacenters across continents

Summary

Key Takeaways

Demand-Filled Look-Aside Cache

Leases Prevent Thundering Herds

Mcrouter for Fan-Out Reduction

Regional Invalidation via McSqueal

Cross-Region Consistency with Remote Markers

Cold Cluster Warmup

Bringing up a new cluster with empty caches would crush the database. Cold cluster warmup lets new clusters fetch from warm clusters, gradually shifting traffic as hit rate improves.

Deep Dive

The numbers: - Billions of requests per second to memcache - Trillions of items cached - Thousands of memcache servers - Multiple datacenters across continents

Trade-offs

Aspect	Advantage	Disadvantage
Look-Aside vs Write-Through	Application controls caching logic; simpler cache implementation; flexibility in what/how to cache	Application complexity increases; cache population happens on miss (cold start latency)
Delete on Write	Simple and idempotent; avoids race conditions; no complex serialization in write path	Next read pays cache-miss penalty; not suitable for write-heavy workloads
Leases	Prevents thundering herds; reduces database load during hot key expiration	Adds latency for non-lease-holders; complexity in handling lease expiration and failures
Regional Replication	Lower read latency for users worldwide; better availability during region failures	Complex consistency semantics; replication lag causes stale reads without mitigations
Gutter Pools	Graceful degradation during server failures; prevents cascade to database	Additional infrastructure; short TTLs mean temporary solution; may serve stale data

Whitepapers

MapReduce: Simplified Data Processing on Large Clusters

Kafka: A Distributed Messaging System for Log Processing

Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services

Bitcoin: A Peer-to-Peer Electronic Cash System

In Search of an Understandable Consensus Algorithm (Raft)

TAO: Facebook's Distributed Data Store for the Social Graph

The Google File System

The Log-Structured Merge-Tree (LSM-Tree)

The Chubby Lock Service for Loosely-Coupled Distributed Systems

Spanner: Google's Globally Distributed Database

Bigtable: A Distributed Storage System for Structured Data