The Consensus Problem
Before diving into specific algorithms, let’s clearly define the consensus problem. In a distributed system with multiple nodes, consensus is the process of agreeing on a single value or state among all non-faulty nodes. A correct consensus algorithm must satisfy the following properties:
- Agreement: All non-faulty nodes decide on the same value.
- Validity: If all nodes propose the same value, then all non-faulty nodes decide on that value.
- Termination: All non-faulty nodes eventually decide on some value.
The challenge is achieving these properties in the presence of:
- Node failures (crash failures or Byzantine failures)
- Network partitions and message delays
- Asynchronous communication
The FLP impossibility result (named after Fischer, Lynch, and Paterson) proved that in an asynchronous system where even one node might fail, no deterministic algorithm can guarantee consensus. However, by relaxing some assumptions or adding timing constraints, practical consensus algorithms can be developed.
Paxos: The Foundation of Modern Consensus
Paxos, introduced by Leslie Lamport in 1989, is the foundation upon which many modern consensus algorithms are built. Despite its theoretical elegance, Paxos is notoriously difficult to understand and implement correctly.
How Paxos Works
Paxos operates in two phases:
Phase 1: Prepare
- A proposer selects a proposal number
n
and sends aprepare(n)
message to a majority of acceptors. - Each acceptor promises not to accept proposals numbered less than
n
and returns the highest-numbered proposal (if any) that it has accepted.
Phase 2: Accept
- If the proposer receives responses from a majority of acceptors, it sends an
accept(n, v)
message to a majority of acceptors, wherev
is the value of the highest-numbered proposal among the responses, or any value if no previous proposals were returned. - An acceptor accepts the proposal unless it has already responded to a
prepare
request with a number greater thann
.
Multi-Paxos Optimization
Basic Paxos (also called Single-Decree Paxos) reaches consensus on a single value. In practice, systems need to agree on a sequence of values, which is where Multi-Paxos comes in:
- Elect a stable leader (proposer) that runs multiple instances of Paxos.
- The leader can skip Phase 1 for subsequent values after the first successful round.
Paxos Implementation Example
Here’s a simplified implementation of the Paxos acceptor role:
class PaxosAcceptor:
def __init__(self):
self.promised_id = None
self.accepted_id = None
self.accepted_value = None
def prepare(self, proposal_id):
if self.promised_id is None or proposal_id > self.promised_id:
self.promised_id = proposal_id
return True, self.accepted_id, self.accepted_value
else:
return False, None, None
def accept(self, proposal_id, value):
if self.promised_id is None or proposal_id >= self.promised_id:
self.promised_id = proposal_id
self.accepted_id = proposal_id
self.accepted_value = value
return True
else:
return False
And the proposer role:
class PaxosProposer:
def __init__(self, acceptors, proposal_id, proposed_value):
self.acceptors = acceptors
self.proposal_id = proposal_id
self.proposed_value = proposed_value
def propose(self):
# Phase 1: Prepare
prepare_count = 0
highest_accepted_id = None
highest_accepted_value = None
for acceptor in self.acceptors:
success, accepted_id, accepted_value = acceptor.prepare(self.proposal_id)
if success:
prepare_count += 1
if accepted_id is not None and (highest_accepted_id is None or accepted_id > highest_accepted_id):
highest_accepted_id = accepted_id
highest_accepted_value = accepted_value
# Check if we have majority
if prepare_count <= len(self.acceptors) // 2:
return False, None
# Phase 2: Accept
# Use the highest accepted value if there is one, otherwise use our proposed value
value_to_propose = highest_accepted_value if highest_accepted_value is not None else self.proposed_value
accept_count = 0
for acceptor in self.acceptors:
if acceptor.accept(self.proposal_id, value_to_propose):
accept_count += 1
# Check if we have majority
if accept_count <= len(self.acceptors) // 2:
return False, None
return True, value_to_propose
When to Use Paxos
Paxos is suitable when:
- You need a proven, battle-tested consensus algorithm
- You’re building a system that requires strong consistency guarantees
- You can tolerate the complexity of implementation
Challenges with Paxos
- Difficult to understand and implement correctly
- Requires careful handling of edge cases
- Performance can degrade under contention
- Multi-Paxos leader election is not part of the core algorithm
Raft: Consensus for Humans
Raft was designed by Diego Ongaro and John Ousterhout as an alternative to Paxos, with a focus on understandability and practical implementation. It has gained significant adoption in systems like etcd, Consul, and InfluxDB.