Patterns

35 items

System Design Pattern

Coordinationconsensuspaxosraftagreementdistributed-systemsadvanced

Consensus Protocols Pattern

Agreement among distributed nodes

Used in: etcd, CockroachDB, TiKV|30 min read

Summary

Consensus protocols enable distributed nodes to agree on a single value or sequence of values despite failures. They solve the fundamental problem of achieving agreement when nodes can crash, messages can be lost, and timing is uncertain. Paxos (theoretical foundation) and Raft (practical implementation) are the dominant protocols. Consensus underlies leader election, distributed locks, and replicated state machines. Understanding consensus is essential for building or operating any strongly consistent distributed system.

Key Takeaways

Majority Quorum is Key

Consensus requires majority (N/2 + 1) agreement. With 5 nodes, need 3 to agree. Any two majorities overlap, ensuring agreement. This is why clusters have odd numbers of nodes.

Safety vs Liveness Trade-off

Safety: Never agree on conflicting values. Liveness: Eventually make progress. FLP impossibility: Cannot guarantee both with async network. Practical systems sacrifice liveness under partition.

Raft is Practical Paxos

Raft designed for understandability. Strong leader, log replication, safety proofs. Used by etcd, CockroachDB, TiKV. Easier to implement correctly than Paxos.

Leader-Based for Efficiency

Single leader coordinates. Followers replicate. Avoids conflicts. Leader election when leader fails. Most practical consensus systems are leader-based.

Log Replication Pattern

Leader appends to log, replicates to followers. Commit when majority acknowledges. Apply committed entries in order. This is replicated state machine pattern.

Performance Costs

Consensus requires multiple round trips. Minimum 2 RTT for commit. Geographic distribution adds latency. Use consensus only when strong consistency required.

Pattern Details

The Two Generals Problem: Two generals must agree on attack time. Messages might be lost. No finite number of messages can guarantee agreement.

FLP Impossibility: In asynchronous system where one process can crash, no protocol can guarantee consensus will be reached.

Practical implication: Consensus systems use timeouts to detect failures. May block during partitions. Trade liveness for safety.

Raft Consensus

Trade-offs

Aspect	Advantage	Disadvantage

Patterns

Horizontal Scaling Pattern

Retry with Backoff Pattern

Replication Pattern

Caching Strategies Pattern

Persistent Connections Pattern

Load Balancing Pattern

Fan-out Pattern

Fan-in Pattern

Circuit Breaker Pattern

Eventual Consistency Pattern

Queue-based Load Leveling Pattern

Bloom Filters Pattern

Time-Series Storage Pattern

Bulkhead Pattern

Batch Processing Pattern

Write-Ahead Log Pattern

API Gateway Pattern

Backend for Frontend Pattern

Sidecar Pattern

Idempotency Pattern

Rate Limiting Pattern

Backpressure Pattern

Pub/Sub Pattern

Strong Consistency Pattern

Conflict Resolution Pattern

Leader Election Pattern