Whitepapers
15 items
15 items
How Facebook transformed a simple cache into a distributed system serving billions of requests per second across global datacenters
Facebook scaled memcached from a single-server cache to a globally distributed system handling billions of requests per second. The paper reveals hard-won lessons: how to reduce latency through batching and parallelism, how to handle thundering herds with leases, how to maintain consistency across regions with invalidation protocols, and how to operate at a scale where even rare edge cases happen constantly. This isn't theoretical—it's battle-tested engineering that powers one of the world's largest websites.
Memcache acts as a demand-filled look-aside cache: the application checks memcache first, falls back to database on miss, then populates cache. This simple pattern scales because reads vastly outnumber writes and cache hits avoid expensive database queries.
When a cached item expires, hundreds of concurrent requests can simultaneously miss and hit the database (thundering herd). Leases give one client exclusive right to populate the cache while others wait, reducing database load from N to 1.
Web servers don't talk directly to memcache servers. Mcrouter, a proxy layer, batches requests, handles routing, and manages connection pooling—reducing millions of TCP connections to thousands and enabling efficient fan-out.
By 2013, Facebook served over a billion users with a read-heavy workload. A single page load could trigger hundreds of data fetches—user profile, friend list, news feed items, photos, comments, likes. Hitting the database for every request was impossible.
The numbers: - Billions of requests per second to memcache - Trillions of items cached - Thousands of memcache servers - Multiple datacenters across continents
Memcached, a simple in-memory key-value store, became the backbone. But scaling it required solving problems the original developers never imagined: reducing latency at massive fan-out, handling cache invalidation across regions, and operating reliably when "rare" events happen thousands of times per second.