Patterns

35 items

System Design Pattern

Data Distributionshardingpartitioninghorizontal-scalingdistributiondatabaseadvanced

Sharding Pattern

Horizontal partitioning of data across multiple nodes

Used in: Distributed Databases, Key-Value Stores, Search Indexes|25 min read

Summary

Sharding (horizontal partitioning) splits a large dataset across multiple database instances, where each shard holds a subset of the data. Unlike vertical partitioning (splitting by columns), sharding splits by rows using a shard key. This enables horizontal scaling beyond single-machine limits: if one database handles 10K queries/second, 10 shards handle 100K. Sharding is essential for large-scale systems - Instagram shards user data by user_id, Slack shards by workspace_id, and Discord shards by guild_id. The key challenge is choosing the right shard key to ensure even distribution and minimize cross-shard queries.

Key Takeaways

Horizontal Scaling Beyond Single Machine

Single database has CPU, memory, and I/O limits. Sharding distributes data across N machines, multiplying capacity by N. This is how systems scale to billions of rows and millions of queries per second.

Shard Key Selection is Critical

The shard key determines which shard holds each row. Bad keys cause hot spots (one shard overloaded) or excessive cross-shard queries. Good keys distribute evenly and keep related data together.

Query Routing Complexity

Application must know which shard to query. Single-shard queries are fast. Cross-shard queries (joins, aggregations) are slow and complex. Design schema to minimize cross-shard operations.

Why single databases don't scale infinitely:

CPU: Query processing is CPU-bound
Memory: Indexes and hot data must fit in RAM
Disk I/O: Read/write throughput has limits
Connections: Maximum concurrent connections (~10K)
Storage: Single disk/volume size limits

Real numbers: - PostgreSQL on large instance: ~50K simple queries/second - MySQL on large instance: ~100K simple queries/second - Single table with 1B rows: queries slow down significantly

Vertical scaling hits a ceiling: The biggest available machine still has limits.

Vertical vs Horizontal Scaling

Summary

Key Takeaways

Horizontal Scaling Beyond Single Machine

Shard Key Selection is Critical

The shard key determines which shard holds each row. Bad keys cause hot spots (one shard overloaded) or excessive cross-shard queries. Good keys distribute evenly and keep related data together.

Query Routing Complexity

Application must know which shard to query. Single-shard queries are fast. Cross-shard queries (joins, aggregations) are slow and complex. Design schema to minimize cross-shard operations.

Resharding is Painful

Adding shards requires redistributing data. With hash-based sharding, adding one shard moves ~50% of data. Consistent hashing reduces this to ~1/N. Plan for growth from the start.

No Cross-Shard Transactions

ACID transactions don't span shards without distributed transaction protocols (2PC). Design for eventual consistency or keep transactional data on same shard.

Operational Complexity

N shards means N databases to manage: backups, monitoring, failover, schema migrations. Use automation and treat shards uniformly to manage complexity.

Pattern Details

Why single databases don't scale infinitely:

CPU: Query processing is CPU-bound
Memory: Indexes and hot data must fit in RAM
Disk I/O: Read/write throughput has limits
Connections: Maximum concurrent connections (~10K)
Storage: Single disk/volume size limits

Real numbers: - PostgreSQL on large instance: ~50K simple queries/second - MySQL on large instance: ~100K simple queries/second - Single table with 1B rows: queries slow down significantly

Vertical scaling hits a ceiling: The biggest available machine still has limits.

Vertical vs Horizontal Scaling

Trade-offs

Aspect	Advantage	Disadvantage

Patterns

Horizontal Scaling Pattern

Retry with Backoff Pattern

Replication Pattern

Caching Strategies Pattern

Persistent Connections Pattern

Load Balancing Pattern

Fan-out Pattern

Fan-in Pattern

Circuit Breaker Pattern

Eventual Consistency Pattern

Queue-based Load Leveling Pattern

Bloom Filters Pattern

Time-Series Storage Pattern

Bulkhead Pattern

Batch Processing Pattern

Write-Ahead Log Pattern

API Gateway Pattern

Backend for Frontend Pattern

Sidecar Pattern

Idempotency Pattern

Rate Limiting Pattern

Backpressure Pattern

Pub/Sub Pattern

Strong Consistency Pattern

Conflict Resolution Pattern

Leader Election Pattern

Consensus Protocols Pattern

CQRS Pattern

LSM Trees Pattern