Whitepapers
15 items
15 items
How Facebook serves billions of reads per second with sub-millisecond latency by co-designing cache and storage for graph workloads
TAO is Facebook's geographically distributed data store optimized for the social graph. It replaces a memcache-based caching layer with a graph-aware cache that understands objects and associations. TAO achieves 99.8% cache hit rate, serves billions of reads per second with sub-millisecond latency, and handles the extreme read-heavy workload (500:1 read-to-write ratio) that characterizes social networking. The key insight: treating the cache as a first-class citizen with graph semantics, not just a generic key-value store in front of MySQL.
TAO models data as objects (nodes) and associations (edges) rather than generic key-value pairs. This graph abstraction maps perfectly to social data—users, posts, photos, friendships, likes—and enables optimized caching and query patterns.
Followers (leaf caches) handle the massive read load while Leaders maintain consistency with the database. This separation allows aggressive read scaling while keeping write coordination tractable.
After a write, the client reads from the Leader cache until the Follower is updated. This provides read-your-writes consistency without strong consistency overhead across the entire system.
Facebook's data is fundamentally a graph: users connected by friendships, posts connected to authors, photos tagged with people, comments on content. Every page load traverses this graph:
By 2013, Facebook had: - 1 billion+ users - Billions of reads/second (peak) - 500:1 read-to-write ratio - Sub-millisecond latency requirement
The original architecture used memcache as a generic cache in front of MySQL:
Client → Memcache → MySQLProblems with this approach:
TAO optimizes for the common 'get all edges from this node' query. Association lists are cached as sorted lists with cursor-based pagination, avoiding the need to fetch entire potentially-huge lists.
All writes go through the cache to the database, then invalidate/update caches. This maintains cache consistency without complex invalidation protocols and ensures the cache always reflects database state.
Facebook's data is fundamentally a graph: users connected by friendships, posts connected to authors, photos tagged with people, comments on content. Every page load traverses this graph:
By 2013, Facebook had: - 1 billion+ users - Billions of reads/second (peak) - 500:1 read-to-write ratio - Sub-millisecond latency requirement
The original architecture used memcache as a generic cache in front of MySQL:
Client → Memcache → MySQLProblems with this approach:
| Aspect | Advantage | Disadvantage |
|---|---|---|
| Graph-Specialized Design | API and caching optimized for objects and associations; efficient graph traversals | Not suitable for arbitrary data; requires mapping data to graph model |
| Eventual Consistency | Enables massive read scalability; local reads in all regions | Users may see stale data; read-your-writes only for own content |
| Two-Tier Caching | Separates read scaling (Followers) from write consistency (Leaders) | Additional complexity; two cache layers to manage and monitor |
| Write-Through Caching | Cache always consistent with database; no stale reads after writes | Write latency includes cache update; single write path can bottleneck |
| Primary Region Architecture | Simplifies consistency; one source of truth for writes | Cross-region write latency; primary region is critical dependency |