How It Works

Data is first partitioned using one strategy, then sub-partitioned using another strategy.

Implementation Example (PostgreSQL)

-- Range-hash composite partitioning
CREATE TABLE user_events (
    user_id INT NOT NULL,
    event_time TIMESTAMP NOT NULL,
    event_type VARCHAR(50) NOT NULL,
    event_data JSONB
) PARTITION BY RANGE (event_time);

-- Create monthly partitions
CREATE TABLE user_events_2025_01 PARTITION OF user_events
    FOR VALUES FROM ('2025-01-01') TO ('2025-02-01')
    PARTITION BY HASH (user_id);

-- Create sub-partitions for January
CREATE TABLE user_events_2025_01_p1 PARTITION OF user_events_2025_01
    FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE user_events_2025_01_p2 PARTITION OF user_events_2025_01
    FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE user_events_2025_01_p3 PARTITION OF user_events_2025_01
    FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE user_events_2025_01_p4 PARTITION OF user_events_2025_01
    FOR VALUES WITH (MODULUS 4, REMAINDER 3);

When to Use Composite Partitioning

  • When you need the benefits of multiple partitioning strategies
  • For large tables that require both time-based organization and even distribution
  • When you have complex query patterns that benefit from multiple partition schemes

Challenges

  • Increased complexity in management and maintenance
  • More complex query planning and optimization
  • Potential for over-partitioning, leading to administrative overhead

5. Consistent Hashing

Consistent hashing is a special form of hash partitioning that minimizes data redistribution when adding or removing nodes.

How It Works

  1. Both data items and nodes are mapped to positions on a conceptual ring using a hash function
  2. Each data item is assigned to the first node encountered when moving clockwise from the item’s position
  3. When a node is added or removed, only a fraction of the data needs to be redistributed
┌───────────────────────────────────────────────┐
│                                               │
│      ┌─────┐                                  │
│      │Node1│                                  │
│      └─────┘                                  │
│         │                                     │
│         ▼                                     │
│    ●────────●                                 │
│   /          \                                │
│  /            \                               │
│ ●              ●                              │
│ │              │                              │
│ │              │ ◄── Data item (assigned to   │
│ │              │      closest node clockwise) │
│ ●              ●                              │
│  \            /                               │
│   \    ●     /                                │
│    ●───┼────●                                 │
│        │                                      │
│        ▼                                      │
│     ┌─────┐                                   │
│     │Node2│                                   │
│     └─────┘                                   │
│                                               │
└───────────────────────────────────────────────┘

Implementation Example (Redis Cluster)

Redis Cluster uses a form of consistent hashing with 16384 hash slots:

# Configure Redis Cluster nodes
redis-cli --cluster create 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 \
  127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 \
  --cluster-replicas 1

When to Use Consistent Hashing

  • In dynamic environments where nodes are frequently added or removed
  • For distributed caches and NoSQL databases
  • When minimizing data movement during scaling is critical

Challenges

  • More complex to implement than traditional hash partitioning
  • May still lead to uneven distribution without virtual nodes
  • Requires careful key design to avoid hotspots

Choosing a Partition Key

The partition key is perhaps the most critical decision in your partitioning strategy. It determines how data is distributed and directly impacts query performance.

Characteristics of a Good Partition Key

  1. High Cardinality: Many distinct values to ensure even distribution
  2. Immutability: Values that don’t change, avoiding the need to move data between partitions
  3. Query Relevance: Frequently used in queries to enable partition pruning
  4. Even Distribution: Leads to balanced partitions without hotspots

Common Partition Keys

Partition Key Pros Cons Example Use Case
User ID Even distribution, natural for user data Poor for cross-user analytics User profiles, preferences
Timestamp Natural for time-series data Potential for hotspots on recent data Event logs, metrics
Geographic Location Data locality, regulatory compliance Uneven user distribution globally User content, regional services
Tenant ID Clean separation in multi-tenant systems Potential for tenant size variation SaaS applications
Product ID Natural for product data Uneven access patterns (popular products) E-commerce catalogs

Implementation Example: Choosing a Partition Key in DynamoDB

// Good partition key (high cardinality, query relevant)
const params = {
    TableName: 'UserSessions',
    KeySchema: [
        { AttributeName: 'userId', KeyType: 'HASH' },  // Partition key
        { AttributeName: 'sessionId', KeyType: 'RANGE' }  // Sort key
    ],
    AttributeDefinitions: [
        { AttributeName: 'userId', AttributeType: 'S' },
        { AttributeName: 'sessionId', AttributeType: 'S' }
    ],
    ProvisionedThroughput: {
        ReadCapacityUnits: 10,
        WriteCapacityUnits: 10
    }
};

dynamodb.createTable(params, function(err, data) {
    if (err) console.log(err);
    else console.log(data);
});