How It Works
Data is first partitioned using one strategy, then sub-partitioned using another strategy.
Implementation Example (PostgreSQL)
-- Range-hash composite partitioning
CREATE TABLE user_events (
user_id INT NOT NULL,
event_time TIMESTAMP NOT NULL,
event_type VARCHAR(50) NOT NULL,
event_data JSONB
) PARTITION BY RANGE (event_time);
-- Create monthly partitions
CREATE TABLE user_events_2025_01 PARTITION OF user_events
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01')
PARTITION BY HASH (user_id);
-- Create sub-partitions for January
CREATE TABLE user_events_2025_01_p1 PARTITION OF user_events_2025_01
FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE user_events_2025_01_p2 PARTITION OF user_events_2025_01
FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE user_events_2025_01_p3 PARTITION OF user_events_2025_01
FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE user_events_2025_01_p4 PARTITION OF user_events_2025_01
FOR VALUES WITH (MODULUS 4, REMAINDER 3);
When to Use Composite Partitioning
- When you need the benefits of multiple partitioning strategies
- For large tables that require both time-based organization and even distribution
- When you have complex query patterns that benefit from multiple partition schemes
Challenges
- Increased complexity in management and maintenance
- More complex query planning and optimization
- Potential for over-partitioning, leading to administrative overhead
5. Consistent Hashing
Consistent hashing is a special form of hash partitioning that minimizes data redistribution when adding or removing nodes.
How It Works
- Both data items and nodes are mapped to positions on a conceptual ring using a hash function
- Each data item is assigned to the first node encountered when moving clockwise from the item’s position
- When a node is added or removed, only a fraction of the data needs to be redistributed
┌───────────────────────────────────────────────┐
│ │
│ ┌─────┐ │
│ │Node1│ │
│ └─────┘ │
│ │ │
│ ▼ │
│ ●────────● │
│ / \ │
│ / \ │
│ ● ● │
│ │ │ │
│ │ │ ◄── Data item (assigned to │
│ │ │ closest node clockwise) │
│ ● ● │
│ \ / │
│ \ ● / │
│ ●───┼────● │
│ │ │
│ ▼ │
│ ┌─────┐ │
│ │Node2│ │
│ └─────┘ │
│ │
└───────────────────────────────────────────────┘
Implementation Example (Redis Cluster)
Redis Cluster uses a form of consistent hashing with 16384 hash slots:
# Configure Redis Cluster nodes
redis-cli --cluster create 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 \
127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 \
--cluster-replicas 1
When to Use Consistent Hashing
- In dynamic environments where nodes are frequently added or removed
- For distributed caches and NoSQL databases
- When minimizing data movement during scaling is critical
Challenges
- More complex to implement than traditional hash partitioning
- May still lead to uneven distribution without virtual nodes
- Requires careful key design to avoid hotspots
Choosing a Partition Key
The partition key is perhaps the most critical decision in your partitioning strategy. It determines how data is distributed and directly impacts query performance.
Characteristics of a Good Partition Key
- High Cardinality: Many distinct values to ensure even distribution
- Immutability: Values that don’t change, avoiding the need to move data between partitions
- Query Relevance: Frequently used in queries to enable partition pruning
- Even Distribution: Leads to balanced partitions without hotspots
Common Partition Keys
Partition Key | Pros | Cons | Example Use Case |
---|---|---|---|
User ID | Even distribution, natural for user data | Poor for cross-user analytics | User profiles, preferences |
Timestamp | Natural for time-series data | Potential for hotspots on recent data | Event logs, metrics |
Geographic Location | Data locality, regulatory compliance | Uneven user distribution globally | User content, regional services |
Tenant ID | Clean separation in multi-tenant systems | Potential for tenant size variation | SaaS applications |
Product ID | Natural for product data | Uneven access patterns (popular products) | E-commerce catalogs |
Implementation Example: Choosing a Partition Key in DynamoDB
// Good partition key (high cardinality, query relevant)
const params = {
TableName: 'UserSessions',
KeySchema: [
{ AttributeName: 'userId', KeyType: 'HASH' }, // Partition key
{ AttributeName: 'sessionId', KeyType: 'RANGE' } // Sort key
],
AttributeDefinitions: [
{ AttributeName: 'userId', AttributeType: 'S' },
{ AttributeName: 'sessionId', AttributeType: 'S' }
],
ProvisionedThroughput: {
ReadCapacityUnits: 10,
WriteCapacityUnits: 10
}
};
dynamodb.createTable(params, function(err, data) {
if (err) console.log(err);
else console.log(data);
});