System Design Fundamentals
11 items
11 items
From ACID to sharding - understanding data storage trade-offs
Databases are the backbone of most systems. Relational databases (PostgreSQL, MySQL) provide ACID guarantees and powerful querying but can be harder to scale horizontally. NoSQL databases (MongoDB, Cassandra, DynamoDB) trade some guarantees for scalability and flexibility. Understanding replication (single-leader, multi-leader, leaderless), partitioning (hash, range), and indexing (B-tree, LSM-tree) helps you choose and optimize the right database for your workload.
Atomicity, Consistency, Isolation, Durability require coordination and disk syncs. Each guarantee adds latency and reduces throughput. Know which guarantees you actually need—many workloads don't require serializable isolation.
Every index must be updated on every write. A table with 5 indexes does 5x the write work. Index what you query, remove what you don't. The right indexes can make queries 1000x faster; wrong indexes waste resources.
Replication copies all data to multiple nodes (read scaling, failover). Sharding splits data across nodes (write scaling, storage scaling). Most systems need replication; fewer need sharding.
Both can scale. SQL offers joins, transactions, and schema enforcement. NoSQL offers flexible schemas and specific access patterns (documents, key-value, wide-column). Choose based on your data relationships and query patterns.
Normalized data avoids duplication but requires joins. Denormalized data duplicates but enables single-table queries. At scale, denormalization often wins because joins across shards are expensive or impossible.
Fetching a list then querying each item individually creates O(N) database calls. This is the most common performance bug. Solutions: eager loading (JOINs), batch queries, or DataLoader patterns.
ACID guarantees provide reliability for database transactions:
Atomicity: Transaction is all-or-nothing. If any part fails, the entire transaction is rolled back.
Consistency: Transaction brings database from one valid state to another. All constraints are satisfied.
Isolation: Concurrent transactions don't interfere with each other. Each appears to execute alone.
Durability: Once committed, data survives crashes. Usually means written to disk.
Isolation levels (from weakest to strongest):
| Level | Dirty Reads | Non-Repeatable Reads | Phantom Reads | Performance | |-------|-------------|----------------------|---------------|-------------| | Read Uncommitted | Possible | Possible | Possible | Fastest | | Read Committed | Prevented | Possible | Possible | Fast | | Repeatable Read | Prevented | Prevented | Possible | Medium | | Serializable | Prevented | Prevented | Prevented | Slowest |
Most databases default to Read Committed. Serializable is rarely needed and significantly impacts performance.
sql-- Set isolation level (PostgreSQL) SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; -- Check current level SHOW transaction_isolation;
| Aspect | Advantage | Disadvantage |
|---|---|---|
| SQL vs NoSQL | SQL: ACID, JOINs, mature tooling; NoSQL: Flexible schema, horizontal scaling, specific access patterns | SQL: Harder to scale writes, schema migrations; NoSQL: Limited transactions, eventual consistency |
| Normalization vs Denormalization | Normalized: No data duplication, easier updates; Denormalized: Faster reads, no joins | Normalized: Slower reads (joins); Denormalized: Data duplication, update anomalies |
| Single-Leader vs Multi-Leader Replication | Single-leader: Simple, consistent; Multi-leader: Write availability in each region | Single-leader: Failover latency, single write bottleneck; Multi-leader: Conflict resolution complexity |
| Sharding Early vs Late | Early: Prepared for growth, avoids emergency migration; Late: Simpler architecture longer | Early: Unnecessary complexity if you never need it; Late: Painful migration under load |