Handling Cross-Partition Operations

One of the biggest challenges in partitioned systems is handling operations that span multiple partitions.

Querying Across Partitions

When a query needs data from multiple partitions, several approaches can be used:

1. Scatter-Gather

Query all partitions in parallel and combine the results.

// Pseudocode for scatter-gather query
List<Partition> partitions = getPartitions();
List<Future<Result>> futures = new ArrayList<>();

// Scatter phase: query all partitions
for (Partition partition : partitions) {
    futures.add(executorService.submit(() -> {
        return partition.executeQuery(query);
    }));
}

// Gather phase: collect and combine results
List<Result> results = new ArrayList<>();
for (Future<Result> future : futures) {
    results.add(future.get());
}

return combineResults(results);

Pros: Complete results, parallelizable Cons: Performance limited by slowest partition, resource intensive

2. Partition Pruning

Analyze the query to determine which partitions need to be accessed.

-- Example of a query that benefits from partition pruning
SELECT * FROM sales 
WHERE sale_date BETWEEN '2025-01-01' AND '2025-01-31';

-- If sales is partitioned by month, only the January partition is accessed

Pros: Improved performance by reducing partitions accessed Cons: Requires query analysis capabilities, not all queries can be pruned

3. Global Indexes

Maintain secondary indexes that span all partitions.

// DynamoDB Global Secondary Index example
CreateTableRequest createTableRequest = new CreateTableRequest()
    .withTableName("Orders")
    .withKeySchema(
        new KeySchemaElement("customerId", KeyType.HASH),
        new KeySchemaElement("orderId", KeyType.RANGE))
    .withAttributeDefinitions(
        new AttributeDefinition("customerId", ScalarAttributeType.S),
        new AttributeDefinition("orderId", ScalarAttributeType.S),
        new AttributeDefinition("orderStatus", ScalarAttributeType.S),
        new AttributeDefinition("orderDate", ScalarAttributeType.S))
    .withGlobalSecondaryIndexes(
        new GlobalSecondaryIndex()
            .withIndexName("OrderStatusIndex")
            .withKeySchema(
                new KeySchemaElement("orderStatus", KeyType.HASH),
                new KeySchemaElement("orderDate", KeyType.RANGE))
            .withProjection(new Projection().withProjectionType(ProjectionType.ALL))
            .withProvisionedThroughput(new ProvisionedThroughput(5L, 5L)));

Pros: Efficient queries on non-partition keys Cons: Index maintenance overhead, eventual consistency challenges

Transactions Across Partitions

Implementing transactions that span multiple partitions is challenging but essential for many applications.

1. Two-Phase Commit (2PC)

A protocol that ensures all partitions either commit or abort a transaction.

Coordinator                 Partition 1                Partition 2
    │                           │                           │
    ├───── Prepare ─────────────┼───── Prepare ─────────────┤
    │                           │                           │
    │                           │                           │
    │◄─── Vote (Yes/No) ────────┼◄─── Vote (Yes/No) ────────┤
    │                           │                           │
    ├───── Commit/Abort ────────┼───── Commit/Abort ────────┤
    │                           │                           │

Pros: Strong consistency guarantees Cons: Blocking protocol, performance impact, vulnerable to coordinator failures

2. Saga Pattern

A sequence of local transactions where each transaction updates data within a single partition.

// Pseudocode for Saga pattern
public void createOrderSaga(Order order) {
    try {
        // Step 1: Create order in Orders partition
        String orderId = orderService.createOrder(order);
        
        try {
            // Step 2: Reserve inventory in Inventory partition
            inventoryService.reserveItems(orderId, order.getItems());
            
            try {
                // Step 3: Process payment in Payments partition
                paymentService.processPayment(orderId, order.getPaymentDetails());
                
                // All steps succeeded
                orderService.completeOrder(orderId);
            } catch (Exception e) {
                // Compensating transaction for Step 2
                inventoryService.releaseItems(orderId);
                // Compensating transaction for Step 1
                orderService.cancelOrder(orderId);
                throw e;
            }
        } catch (Exception e) {
            // Compensating transaction for Step 1
            orderService.cancelOrder(orderId);
            throw e;
        }
    } catch (Exception e) {
        // Handle saga failure
        throw new SagaFailedException("Failed to create order", e);
    }
}

Pros: No distributed locking, better performance Cons: Eventually consistent, complex compensation logic

3. Distributed Consensus

Use consensus algorithms like Paxos or Raft to agree on transaction outcomes.

Pros: Strong consistency without blocking Cons: Complex implementation, performance overhead


Rebalancing Partitions

As data grows or access patterns change, you may need to rebalance partitions to maintain performance.

When to Rebalance

  • When partitions become unbalanced in size or load
  • When adding or removing nodes
  • When access patterns change significantly

Rebalancing Strategies

1. Hash-Based Rebalancing

Rehash data using a new partition count.

Before (3 partitions):
Partition = hash(key) % 3

After (5 partitions):
Partition = hash(key) % 5

Pros: Even distribution Cons: Massive data movement (up to 80% of data may move)