Patterns
35 items
35 items
Horizontal partitioning of data across multiple nodes
Sharding (horizontal partitioning) splits a large dataset across multiple database instances, where each shard holds a subset of the data. Unlike vertical partitioning (splitting by columns), sharding splits by rows using a shard key. This enables horizontal scaling beyond single-machine limits: if one database handles 10K queries/second, 10 shards handle 100K. Sharding is essential for large-scale systems - Instagram shards user data by user_id, Slack shards by workspace_id, and Discord shards by guild_id. The key challenge is choosing the right shard key to ensure even distribution and minimize cross-shard queries.
Single database has CPU, memory, and I/O limits. Sharding distributes data across N machines, multiplying capacity by N. This is how systems scale to billions of rows and millions of queries per second.
The shard key determines which shard holds each row. Bad keys cause hot spots (one shard overloaded) or excessive cross-shard queries. Good keys distribute evenly and keep related data together.
Application must know which shard to query. Single-shard queries are fast. Cross-shard queries (joins, aggregations) are slow and complex. Design schema to minimize cross-shard operations.
Why single databases don't scale infinitely:
Real numbers: - PostgreSQL on large instance: ~50K simple queries/second - MySQL on large instance: ~100K simple queries/second - Single table with 1B rows: queries slow down significantly
Vertical scaling hits a ceiling: The biggest available machine still has limits.