Whitepapers
15 items
15 items
The consensus algorithm designed for understandability that powers etcd, Consul, and CockroachDB
Raft solves the distributed consensus problem—getting multiple servers to agree on a sequence of values even when some servers fail. Unlike Paxos, Raft was designed from the ground up for understandability by decomposing consensus into three relatively independent subproblems: leader election, log replication, and safety. The result is an algorithm that engineers can actually implement correctly.
All writes flow through a single leader, simplifying reasoning about consistency. The leader has complete authority over log replication—it never overwrites its own entries and all entries flow from leader to followers.
Split votes are resolved elegantly using randomized timeouts (150-300ms typically). This simple mechanism avoids the complexity of ranking or priority schemes while ensuring elections complete quickly.
If two logs contain an entry with the same index and term, then the logs are identical in all preceding entries. This invariant, enforced by a simple consistency check during AppendEntries, is the foundation of Raft's safety.
Distributed systems need consensus to maintain consistency across replicas. Consider a replicated key-value store: when a client writes `x=5`, all replicas must eventually agree on this value and the order of all writes. Without consensus, replicas diverge and clients see inconsistent data.
The fundamental challenge: servers can fail at any time, network partitions can isolate groups of servers, and messages can be delayed or reordered. Despite these failures, the system must:
Paxos solved this problem in 1989, but its specification is notoriously difficult to understand. Real implementations like Google's Chubby required significant extensions not covered in the original paper. Raft was created specifically to be understandable while providing the same guarantees.