SystemExpertsSystemExperts
Pricing

Whitepapers

15 items

MapReduce: Simplified Data Processing on Large Clusters

30mintermediate

Kafka: A Distributed Messaging System for Log Processing

30mintermediate

Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services

25mintermediate

Bitcoin: A Peer-to-Peer Electronic Cash System

30mintermediate

In Search of an Understandable Consensus Algorithm (Raft)

35madvanced

TAO: Facebook's Distributed Data Store for the Social Graph

35madvanced

The Google File System

35madvanced

The Log-Structured Merge-Tree (LSM-Tree)

35madvanced

The Chubby Lock Service for Loosely-Coupled Distributed Systems

30madvanced

Spanner: Google's Globally Distributed Database

40madvanced

Bigtable: A Distributed Storage System for Structured Data

35madvanced

Scaling Memcache at Facebook

35madvanced

Large-scale cluster management at Google with Borg

35madvanced

The Next 700 Programming Languages

30madvanced

The Part-Time Parliament

40madvanced
distributed-systemsconsensuspaxosfault-tolerancelamportreplicationtheoryadvanced

The Part-Time Parliament

Lamport's legendary consensus algorithm, presented as the tale of an ancient Greek parliament—the foundation of all modern distributed consensus

Leslie Lamport|Microsoft Research|1998|40 min read
View Original Paper

Summary

Paxos is the foundational algorithm for achieving consensus in a distributed system where nodes may fail. Presented through an elaborate parable about the parliament of the ancient Greek island of Paxos, the paper describes how legislators who only attended part-time (nodes that may fail) could still pass consistent decrees (agree on values). The algorithm guarantees that if any value is chosen, all nodes that learn a value will learn the same one—even if nodes fail and recover, messages are lost or delayed, and there is no bound on timing. Paxos and its variants underlie virtually all modern distributed systems requiring consensus.

Key Takeaways

Safety Despite Asynchrony

Paxos never violates safety (agreement on a single value) regardless of timing. Messages can be arbitrarily delayed, nodes can fail and recover, and the algorithm still guarantees that once a value is chosen, no other value will ever be chosen. Liveness requires eventual synchrony, but safety is unconditional.

Proposal Numbers as Logical Ordering

Each proposal has a unique number that establishes ordering. Higher-numbered proposals can override lower-numbered ones, but only if the lower-numbered value hasn't been chosen. This mechanism allows progress when leaders fail while preserving any previously chosen values.

Two-Phase Protocol: Prepare and Accept

Phase 1 (Prepare) establishes the right to propose and learns of any previously accepted values. Phase 2 (Accept) proposes a value—either the proposer's choice or a previously accepted value. This two-phase structure ensures that chosen values are never overwritten.

Lamport presents Paxos through an elaborate parable about the ancient Greek island of Paxos, whose parliament had an unusual problem: legislators were merchants and olive traders who couldn't attend full-time. They would wander in and out of the parliamentary chamber, yet the parliament needed to pass consistent decrees.

The constraints: - Legislators could enter or leave the chamber at any time - Messages between legislators could be lost - No central authority to coordinate - Yet every decree ever passed must be recorded consistently

The real problem this models: - Legislators = distributed processes/nodes - Entering/leaving = node failures and recoveries - Lost messages = network unreliability - Passing decrees = achieving consensus on values

The parable, while whimsical, precisely captures the constraints of distributed consensus. The parliament's protocol—later called Paxos—became the foundation of distributed systems theory.

Summary

Paxos is the foundational algorithm for achieving consensus in a distributed system where nodes may fail. Presented through an elaborate parable about the parliament of the ancient Greek island of Paxos, the paper describes how legislators who only attended part-time (nodes that may fail) could still pass consistent decrees (agree on values). The algorithm guarantees that if any value is chosen, all nodes that learn a value will learn the same one—even if nodes fail and recover, messages are lost or delayed, and there is no bound on timing. Paxos and its variants underlie virtually all modern distributed systems requiring consensus.

Key Takeaways

Safety Despite Asynchrony

Paxos never violates safety (agreement on a single value) regardless of timing. Messages can be arbitrarily delayed, nodes can fail and recover, and the algorithm still guarantees that once a value is chosen, no other value will ever be chosen. Liveness requires eventual synchrony, but safety is unconditional.

Proposal Numbers as Logical Ordering

Each proposal has a unique number that establishes ordering. Higher-numbered proposals can override lower-numbered ones, but only if the lower-numbered value hasn't been chosen. This mechanism allows progress when leaders fail while preserving any previously chosen values.

Two-Phase Protocol: Prepare and Accept

Phase 1 (Prepare) establishes the right to propose and learns of any previously accepted values. Phase 2 (Accept) proposes a value—either the proposer's choice or a previously accepted value. This two-phase structure ensures that chosen values are never overwritten.

Premium Content

Sign in to access this content or upgrade for full access.

Majority Quorums for Overlap

Both phases require responses from a majority of acceptors. Since any two majorities overlap, a value accepted by a majority is guaranteed to be seen by any future prepare phase. This overlap is the key to consistency without coordination.

Learning Through Acceptor Reports

Learners discover chosen values when acceptors report their accepts. In the basic protocol, each acceptor sends accepts to all learners. Optimizations exist for reducing message complexity while maintaining correctness.

Multi-Paxos for Sequences of Values

Single-decree Paxos chooses one value. Real systems need to choose sequences of values (log entries). Multi-Paxos runs separate Paxos instances for each log position, with optimizations to reduce overhead for consecutive entries.

Deep Dive

Lamport presents Paxos through an elaborate parable about the ancient Greek island of Paxos, whose parliament had an unusual problem: legislators were merchants and olive traders who couldn't attend full-time. They would wander in and out of the parliamentary chamber, yet the parliament needed to pass consistent decrees.

The constraints: - Legislators could enter or leave the chamber at any time - Messages between legislators could be lost - No central authority to coordinate - Yet every decree ever passed must be recorded consistently

The real problem this models: - Legislators = distributed processes/nodes - Entering/leaving = node failures and recoveries - Lost messages = network unreliability - Passing decrees = achieving consensus on values

The parable, while whimsical, precisely captures the constraints of distributed consensus. The parliament's protocol—later called Paxos—became the foundation of distributed systems theory.

Trade-offs

AspectAdvantageDisadvantage
Safety vs Liveness SeparationSafety always guaranteed regardless of timing; system never produces wrong answerLiveness not guaranteed; competing proposers can livelock; needs additional mechanisms
Majority QuorumsAny two majorities overlap, ensuring consistency; simple to reason aboutRequires majority availability; can't make progress with exactly half nodes; inflexible
Two-Phase ProtocolDiscover any existing chosen values; enable safe preemption; correct despite concurrent proposersTwo round-trips adds latency; Phase 1 can be skipped only with stable leader
No Distinguished LeaderNo single point of failure for correctness; any node can proposeConcurrent proposers can conflict; leader optimization needed for performance
Single-Decree PrimitiveClean theoretical foundation; correctness proof is tractableGap to practical Multi-Paxos is significant; many details left to implementer