Advanced Patterns and Techniques

How It Works

Data is first partitioned using one strategy, then sub-partitioned using another strategy.

Implementation Example (PostgreSQL)

-- Range-hash composite partitioning
CREATE TABLE user_events (
    user_id INT NOT NULL,
    event_time TIMESTAMP NOT NULL,
    event_type VARCHAR(50) NOT NULL,
    event_data JSONB
) PARTITION BY RANGE (event_time);

-- Create monthly partitions
CREATE TABLE user_events_2025_01 PARTITION OF user_events
    FOR VALUES FROM ('2025-01-01') TO ('2025-02-01')
    PARTITION BY HASH (user_id);

-- Create sub-partitions for January
CREATE TABLE user_events_2025_01_p1 PARTITION OF user_events_2025_01
    FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE user_events_2025_01_p2 PARTITION OF user_events_2025_01
    FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE user_events_2025_01_p3 PARTITION OF user_events_2025_01
    FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE user_events_2025_01_p4 PARTITION OF user_events_2025_01
    FOR VALUES WITH (MODULUS 4, REMAINDER 3);

When to Use Composite Partitioning

When you need the benefits of multiple partitioning strategies
For large tables that require both time-based organization and even distribution
When you have complex query patterns that benefit from multiple partition schemes

Challenges

Increased complexity in management and maintenance
More complex query planning and optimization
Potential for over-partitioning, leading to administrative overhead

5. Consistent Hashing

Consistent hashing is a special form of hash partitioning that minimizes data redistribution when adding or removing nodes.

How It Works

Both data items and nodes are mapped to positions on a conceptual ring using a hash function
Each data item is assigned to the first node encountered when moving clockwise from the item’s position
When a node is added or removed, only a fraction of the data needs to be redistributed

┌───────────────────────────────────────────────┐
│                                               │
│      ┌─────┐                                  │
│      │Node1│                                  │
│      └─────┘                                  │
│         │                                     │
│         ▼                                     │
│    ●────────●                                 │
│   /          \                                │
│  /            \                               │
│ ●              ●                              │
│ │              │                              │
│ │              │ ◄── Data item (assigned to   │
│ │              │      closest node clockwise) │
│ ●              ●                              │
│  \            /                               │
│   \    ●     /                                │
│    ●───┼────●                                 │
│        │                                      │
│        ▼                                      │
│     ┌─────┐                                   │
│     │Node2│                                   │
│     └─────┘                                   │
│                                               │
└───────────────────────────────────────────────┘

Implementation Example (Redis Cluster)

Redis Cluster uses a form of consistent hashing with 16384 hash slots:

# Configure Redis Cluster nodes
redis-cli --cluster create 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 \
  127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 \
  --cluster-replicas 1

When to Use Consistent Hashing

In dynamic environments where nodes are frequently added or removed
For distributed caches and NoSQL databases
When minimizing data movement during scaling is critical

Challenges

More complex to implement than traditional hash partitioning
May still lead to uneven distribution without virtual nodes
Requires careful key design to avoid hotspots

Choosing a Partition Key

The partition key is perhaps the most critical decision in your partitioning strategy. It determines how data is distributed and directly impacts query performance.

Characteristics of a Good Partition Key

High Cardinality: Many distinct values to ensure even distribution
Immutability: Values that don’t change, avoiding the need to move data between partitions
Query Relevance: Frequently used in queries to enable partition pruning
Even Distribution: Leads to balanced partitions without hotspots

Common Partition Keys

Partition Key	Pros	Cons	Example Use Case
User ID	Even distribution, natural for user data	Poor for cross-user analytics	User profiles, preferences
Timestamp	Natural for time-series data	Potential for hotspots on recent data	Event logs, metrics
Geographic Location	Data locality, regulatory compliance	Uneven user distribution globally	User content, regional services
Tenant ID	Clean separation in multi-tenant systems	Potential for tenant size variation	SaaS applications
Product ID	Natural for product data	Uneven access patterns (popular products)	E-commerce catalogs

Implementation Example: Choosing a Partition Key in DynamoDB

// Good partition key (high cardinality, query relevant)
const params = {
    TableName: 'UserSessions',
    KeySchema: [
        { AttributeName: 'userId', KeyType: 'HASH' },  // Partition key
        { AttributeName: 'sessionId', KeyType: 'RANGE' }  // Sort key
    ],
    AttributeDefinitions: [
        { AttributeName: 'userId', AttributeType: 'S' },
        { AttributeName: 'sessionId', AttributeType: 'S' }
    ],
    ProvisionedThroughput: {
        ReadCapacityUnits: 10,
        WriteCapacityUnits: 10
    }
};

dynamodb.createTable(params, function(err, data) {
    if (err) console.log(err);
    else console.log(data);
});

How It Works

Implementation Example (PostgreSQL)

When to Use Composite Partitioning

Challenges

5. Consistent Hashing

How It Works

Implementation Example (Redis Cluster)

When to Use Consistent Hashing

Challenges

Choosing a Partition Key

Characteristics of a Good Partition Key

Common Partition Keys

Implementation Example: Choosing a Partition Key in DynamoDB

Continue Your Learning