Advanced Patterns and Techniques

2. Discovery

The new leader synchronizes its state with followers
Followers acknowledge the leader and confirm synchronization

3. Broadcast

The leader receives client requests and creates proposals
Each proposal is assigned a monotonically increasing identifier (zxid)
Proposals are broadcast to followers
When a majority of followers acknowledge a proposal, the leader commits it
The commit decision is broadcast to followers

ZAB Implementation Example

Here’s a simplified implementation of ZAB’s core components:

public class ZabServer {
    enum ServerState {
        LOOKING, FOLLOWING, LEADING
    }
    
    private ServerState state;
    private long currentEpoch;
    private long lastZxid;
    private List<Transaction> history;
    private Map<Long, Transaction> pendingTransactions;
    private Set<ZabServer> quorum;
    
    // Leader election
    public void startLeaderElection() {
        state = ServerState.LOOKING;
        
        // Send out vote with last zxid
        Vote myVote = new Vote(getServerId(), currentEpoch, lastZxid);
        broadcastVote(myVote);
        
        // Collect votes and determine winner
        // (Simplified - actual implementation uses multiple rounds)
        Map<Integer, Vote> receivedVotes = collectVotes();
        Vote electedLeader = selectLeader(receivedVotes);
        
        if (electedLeader.getServerId() == getServerId()) {
            becomeLeader();
        } else {
            becomeFollower(electedLeader.getServerId());
        }
    }
    
    // Leader functionality
    private void becomeLeader() {
        state = ServerState.LEADING;
        currentEpoch++;
        
        // Discovery phase
        List<FollowerInfo> followers = collectFollowerInfo();
        long newEpochZxid = createNewEpochZxid();
        
        // Find highest zxid among followers
        long highestZxid = lastZxid;
        for (FollowerInfo info : followers) {
            if (info.getLastZxid() > highestZxid) {
                highestZxid = info.getLastZxid();
            }
        }
        
        // Synchronize followers
        for (FollowerInfo follower : followers) {
            synchronizeFollower(follower, highestZxid);
        }
        
        // Start processing client requests
        startProcessingRequests();
    }
    
    // Process a client request as leader
    public void processRequest(Request request) {
        // Create transaction
        Transaction txn = new Transaction(
            createZxid(),
            request.getOperation(),
            request.getData()
        );
        
        // Store in pending transactions
        pendingTransactions.put(txn.getZxid(), txn);
        
        // Broadcast to followers
        List<ZabServer> acks = broadcastTransaction(txn);
        
        // If majority acknowledge, commit
        if (acks.size() > quorum.size() / 2) {
            commit(txn);
            broadcastCommit(txn);
        }
    }
    
    // Follower functionality
    private void becomeFollower(int leaderId) {
        state = ServerState.FOLLOWING;
        
        // Connect to leader
        connectToLeader(leaderId);
        
        // Send follower info
        sendFollowerInfo(lastZxid);
        
        // Process messages from leader
        while (state == ServerState.FOLLOWING) {
            Message message = receiveFromLeader();
            
            if (message instanceof Transaction) {
                Transaction txn = (Transaction) message;
                
                // Validate transaction
                if (isValidTransaction(txn)) {
                    // Log transaction
                    logTransaction(txn);
                    
                    // Acknowledge
                    sendAck(txn.getZxid());
                }
            } else if (message instanceof Commit) {
                Commit commit = (Commit) message;
                
                // Apply committed transaction
                applyTransaction(commit.getZxid());
            }
        }
    }
    
    // Create a new zxid
    private long createZxid() {
        // zxid is a 64-bit number:
        // - high 32 bits: epoch
        // - low 32 bits: counter
        return (currentEpoch << 32) | (++lastZxid & 0xFFFFFFFFL);
    }
    
    // Apply a transaction to the state machine
    private void applyTransaction(long zxid) {
        Transaction txn = pendingTransactions.get(zxid);
        if (txn != null) {
            // Apply to state machine
            applyToStateMachine(txn);
            
            // Add to history
            history.add(txn);
            
            // Remove from pending
            pendingTransactions.remove(zxid);
        }
    }
}

When to Use ZAB

ZAB is appropriate when:

You’re using ZooKeeper as your coordination service
You need a protocol optimized for primary-backup replication
You require strong ordering guarantees for operations

Challenges with ZAB

Specifically designed for ZooKeeper, less general-purpose
Less documentation and fewer implementations compared to Paxos and Raft
Leader-based design can be a bottleneck

Byzantine Fault Tolerance (BFT) Algorithms

The algorithms discussed so far assume crash-failure: nodes either work correctly or stop working entirely. Byzantine fault tolerance addresses a more challenging scenario where nodes can behave arbitrarily, including sending conflicting information to different parts of the system.

Practical Byzantine Fault Tolerance (PBFT)

PBFT, introduced by Castro and Liskov in 1999, was the first practical Byzantine consensus algorithm. It can tolerate up to f Byzantine failures with 3f+1 total nodes.

How PBFT Works

PBFT operates in three phases:

Pre-prepare: The leader assigns a sequence number to a request and multicasts it to all replicas
Prepare: Replicas verify the request and broadcast prepare messages
Commit: Once a replica receives 2f prepare messages, it broadcasts a commit message

A request is executed once a replica receives 2f+1 commit messages.

When to Use BFT Algorithms

BFT algorithms are necessary when:

You cannot trust all nodes in your system
Nodes might be compromised or behave maliciously
You’re building systems like blockchains or critical infrastructure

Challenges with BFT

Higher message complexity (O(n²) where n is the number of nodes)
Requires more nodes to tolerate the same number of failures
Significantly more complex to implement correctly