Patterns
35 items
35 items
Agreement among distributed nodes
Consensus protocols enable distributed nodes to agree on a single value or sequence of values despite failures. They solve the fundamental problem of achieving agreement when nodes can crash, messages can be lost, and timing is uncertain. Paxos (theoretical foundation) and Raft (practical implementation) are the dominant protocols. Consensus underlies leader election, distributed locks, and replicated state machines. Understanding consensus is essential for building or operating any strongly consistent distributed system.
Consensus requires majority (N/2 + 1) agreement. With 5 nodes, need 3 to agree. Any two majorities overlap, ensuring agreement. This is why clusters have odd numbers of nodes.
Safety: Never agree on conflicting values. Liveness: Eventually make progress. FLP impossibility: Cannot guarantee both with async network. Practical systems sacrifice liveness under partition.
Raft designed for understandability. Strong leader, log replication, safety proofs. Used by etcd, CockroachDB, TiKV. Easier to implement correctly than Paxos.
Single leader coordinates. Followers replicate. Avoids conflicts. Leader election when leader fails. Most practical consensus systems are leader-based.
Leader appends to log, replicates to followers. Commit when majority acknowledges. Apply committed entries in order. This is replicated state machine pattern.
Consensus requires multiple round trips. Minimum 2 RTT for commit. Geographic distribution adds latency. Use consensus only when strong consistency required.
The Two Generals Problem: Two generals must agree on attack time. Messages might be lost. No finite number of messages can guarantee agreement.
FLP Impossibility: In asynchronous system where one process can crash, no protocol can guarantee consensus will be reached.
Practical implication: Consensus systems use timeouts to detect failures. May block during partitions. Trade liveness for safety.
| Aspect | Advantage | Disadvantage |
|---|---|---|