The Consensus Problem

Before diving into specific algorithms, let’s clearly define the consensus problem. In a distributed system with multiple nodes, consensus is the process of agreeing on a single value or state among all non-faulty nodes. A correct consensus algorithm must satisfy the following properties:

  1. Agreement: All non-faulty nodes decide on the same value.
  2. Validity: If all nodes propose the same value, then all non-faulty nodes decide on that value.
  3. Termination: All non-faulty nodes eventually decide on some value.

The challenge is achieving these properties in the presence of:

  • Node failures (crash failures or Byzantine failures)
  • Network partitions and message delays
  • Asynchronous communication

The FLP impossibility result (named after Fischer, Lynch, and Paterson) proved that in an asynchronous system where even one node might fail, no deterministic algorithm can guarantee consensus. However, by relaxing some assumptions or adding timing constraints, practical consensus algorithms can be developed.


Paxos: The Foundation of Modern Consensus

Paxos, introduced by Leslie Lamport in 1989, is the foundation upon which many modern consensus algorithms are built. Despite its theoretical elegance, Paxos is notoriously difficult to understand and implement correctly.

How Paxos Works

Paxos operates in two phases:

Phase 1: Prepare

  1. A proposer selects a proposal number n and sends a prepare(n) message to a majority of acceptors.
  2. Each acceptor promises not to accept proposals numbered less than n and returns the highest-numbered proposal (if any) that it has accepted.

Phase 2: Accept

  1. If the proposer receives responses from a majority of acceptors, it sends an accept(n, v) message to a majority of acceptors, where v is the value of the highest-numbered proposal among the responses, or any value if no previous proposals were returned.
  2. An acceptor accepts the proposal unless it has already responded to a prepare request with a number greater than n.

Multi-Paxos Optimization

Basic Paxos (also called Single-Decree Paxos) reaches consensus on a single value. In practice, systems need to agree on a sequence of values, which is where Multi-Paxos comes in:

  1. Elect a stable leader (proposer) that runs multiple instances of Paxos.
  2. The leader can skip Phase 1 for subsequent values after the first successful round.

Paxos Implementation Example

Here’s a simplified implementation of the Paxos acceptor role:

class PaxosAcceptor:
    def __init__(self):
        self.promised_id = None
        self.accepted_id = None
        self.accepted_value = None
    
    def prepare(self, proposal_id):
        if self.promised_id is None or proposal_id > self.promised_id:
            self.promised_id = proposal_id
            return True, self.accepted_id, self.accepted_value
        else:
            return False, None, None
    
    def accept(self, proposal_id, value):
        if self.promised_id is None or proposal_id >= self.promised_id:
            self.promised_id = proposal_id
            self.accepted_id = proposal_id
            self.accepted_value = value
            return True
        else:
            return False

And the proposer role:

class PaxosProposer:
    def __init__(self, acceptors, proposal_id, proposed_value):
        self.acceptors = acceptors
        self.proposal_id = proposal_id
        self.proposed_value = proposed_value
        
    def propose(self):
        # Phase 1: Prepare
        prepare_count = 0
        highest_accepted_id = None
        highest_accepted_value = None
        
        for acceptor in self.acceptors:
            success, accepted_id, accepted_value = acceptor.prepare(self.proposal_id)
            if success:
                prepare_count += 1
                if accepted_id is not None and (highest_accepted_id is None or accepted_id > highest_accepted_id):
                    highest_accepted_id = accepted_id
                    highest_accepted_value = accepted_value
        
        # Check if we have majority
        if prepare_count <= len(self.acceptors) // 2:
            return False, None
        
        # Phase 2: Accept
        # Use the highest accepted value if there is one, otherwise use our proposed value
        value_to_propose = highest_accepted_value if highest_accepted_value is not None else self.proposed_value
        
        accept_count = 0
        for acceptor in self.acceptors:
            if acceptor.accept(self.proposal_id, value_to_propose):
                accept_count += 1
        
        # Check if we have majority
        if accept_count <= len(self.acceptors) // 2:
            return False, None
        
        return True, value_to_propose

When to Use Paxos

Paxos is suitable when:

  • You need a proven, battle-tested consensus algorithm
  • You’re building a system that requires strong consistency guarantees
  • You can tolerate the complexity of implementation

Challenges with Paxos

  • Difficult to understand and implement correctly
  • Requires careful handling of edge cases
  • Performance can degrade under contention
  • Multi-Paxos leader election is not part of the core algorithm

Raft: Consensus for Humans

Raft was designed by Diego Ongaro and John Ousterhout as an alternative to Paxos, with a focus on understandability and practical implementation. It has gained significant adoption in systems like etcd, Consul, and InfluxDB.