TAO: Facebook's Distributed Data Store for the Social Graph

SystemExperts

Pricing

Whitepapers

15 items

MapReduce: Simplified Data Processing on Large Clusters

30mintermediate

Kafka: A Distributed Messaging System for Log Processing

30mintermediate

Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services

25mintermediate

Summary

TAO is Facebook's geographically distributed data store optimized for the social graph. It replaces a memcache-based caching layer with a graph-aware cache that understands objects and associations. TAO achieves 99.8% cache hit rate, serves billions of reads per second with sub-millisecond latency, and handles the extreme read-heavy workload (500:1 read-to-write ratio) that characterizes social networking. The key insight: treating the cache as a first-class citizen with graph semantics, not just a generic key-value store in front of MySQL.

Key Takeaways

Graph-Aware Data Model

TAO models data as objects (nodes) and associations (edges) rather than generic key-value pairs. This graph abstraction maps perfectly to social data—users, posts, photos, friendships, likes—and enables optimized caching and query patterns.

Two-Level Caching Hierarchy

Followers (leaf caches) handle the massive read load while Leaders maintain consistency with the database. This separation allows aggressive read scaling while keeping write coordination tractable.

Read-Your-Writes via Leaders

After a write, the client reads from the Leader cache until the Follower is updated. This provides read-your-writes consistency without strong consistency overhead across the entire system.

Facebook's data is fundamentally a graph: users connected by friendships, posts connected to authors, photos tagged with people, comments on content. Every page load traverses this graph:

News Feed: Fetch posts from friends, sorted by relevance
Profile: Fetch user's posts, photos, friends, about info
Photo: Fetch photo, tags, likes, comments, who else liked it

By 2013, Facebook had: - 1 billion+ users - Billions of reads/second (peak) - 500:1 read-to-write ratio - Sub-millisecond latency requirement

Social Graph Structure

The original architecture used memcache as a generic cache in front of MySQL:

Client → Memcache → MySQL

Problems with this approach:

Thundering herds: Cache miss causes thousands of concurrent MySQL queries for hot objects
Inefficient invalidation: Deleting a user requires invalidating all their posts, comments, likes individually
No graph semantics: Can't efficiently answer "get all friends who liked this post"
Complex client logic: Application code managed cache consistency, leading to bugs
Read-after-write issues: User creates post, refreshes, post not visible (stale cache)

Summary

Key Takeaways

Graph-Aware Data Model

Two-Level Caching Hierarchy

Followers (leaf caches) handle the massive read load while Leaders maintain consistency with the database. This separation allows aggressive read scaling while keeping write coordination tractable.

Read-Your-Writes via Leaders

After a write, the client reads from the Leader cache until the Follower is updated. This provides read-your-writes consistency without strong consistency overhead across the entire system.

Eventual Consistency with Bounded Staleness

TAO accepts eventual consistency but bounds staleness through cache invalidation and version checks. For social workloads, reading a slightly stale like count is acceptable; reading your own deleted post is not.

Premium Content

Deep Dive