SystemExpertsSystemExperts
Pricing

System Design Fundamentals

11 items

Scalability Fundamentals

25mbeginner

Latency, Throughput & Performance

30mbeginner

Back-of-Envelope Calculations

25mbeginner

Availability & Reliability Fundamentals

35mintermediate

CAP Theorem & Consistency Models

40mintermediate

Load Balancing Deep Dive

35mintermediate

Asynchronous Processing & Message Queues

30mintermediate

Networking & Protocols

30mintermediate

Caching Strategies

35mintermediate

System Design Fundamentals

20mintermediate

Database Fundamentals

40madvanced
Fundamentalsdatabasessqlnosqlshardingreplicationindexingfundamentalssystem-designadvanced

Database Fundamentals

From ACID to sharding - understanding data storage trade-offs

Foundation knowledge|40 min read

Summary

Databases are the backbone of most systems. Relational databases (PostgreSQL, MySQL) provide ACID guarantees and powerful querying but can be harder to scale horizontally. NoSQL databases (MongoDB, Cassandra, DynamoDB) trade some guarantees for scalability and flexibility. Understanding replication (single-leader, multi-leader, leaderless), partitioning (hash, range), and indexing (B-tree, LSM-tree) helps you choose and optimize the right database for your workload.

Key Takeaways

ACID Guarantees Have Real Costs

Atomicity, Consistency, Isolation, Durability require coordination and disk syncs. Each guarantee adds latency and reduces throughput. Know which guarantees you actually need—many workloads don't require serializable isolation.

Indexes Speed Reads but Slow Writes

Every index must be updated on every write. A table with 5 indexes does 5x the write work. Index what you query, remove what you don't. The right indexes can make queries 1000x faster; wrong indexes waste resources.

Replication is for Availability, Sharding is for Scale

Replication copies all data to multiple nodes (read scaling, failover). Sharding splits data across nodes (write scaling, storage scaling). Most systems need replication; fewer need sharding.

SQL vs NoSQL is About Data Model, Not Scale

Both can scale. SQL offers joins, transactions, and schema enforcement. NoSQL offers flexible schemas and specific access patterns (documents, key-value, wide-column). Choose based on your data relationships and query patterns.

Denormalization Trades Storage for Read Performance

Normalized data avoids duplication but requires joins. Denormalized data duplicates but enables single-table queries. At scale, denormalization often wins because joins across shards are expensive or impossible.

The N+1 Query Problem is Everywhere

Fetching a list then querying each item individually creates O(N) database calls. This is the most common performance bug. Solutions: eager loading (JOINs), batch queries, or DataLoader patterns.

Deep Dive

ACID guarantees provide reliability for database transactions:

Atomicity: Transaction is all-or-nothing. If any part fails, the entire transaction is rolled back.

Consistency: Transaction brings database from one valid state to another. All constraints are satisfied.

Isolation: Concurrent transactions don't interfere with each other. Each appears to execute alone.

Durability: Once committed, data survives crashes. Usually means written to disk.

ACID Transaction Example

Isolation levels (from weakest to strongest):

| Level | Dirty Reads | Non-Repeatable Reads | Phantom Reads | Performance | |-------|-------------|----------------------|---------------|-------------| | Read Uncommitted | Possible | Possible | Possible | Fastest | | Read Committed | Prevented | Possible | Possible | Fast | | Repeatable Read | Prevented | Prevented | Possible | Medium | | Serializable | Prevented | Prevented | Prevented | Slowest |

Most databases default to Read Committed. Serializable is rarely needed and significantly impacts performance.

sql
-- Set isolation level (PostgreSQL) SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; -- Check current level SHOW transaction_isolation;

Trade-offs

AspectAdvantageDisadvantage
SQL vs NoSQLSQL: ACID, JOINs, mature tooling; NoSQL: Flexible schema, horizontal scaling, specific access patternsSQL: Harder to scale writes, schema migrations; NoSQL: Limited transactions, eventual consistency
Normalization vs DenormalizationNormalized: No data duplication, easier updates; Denormalized: Faster reads, no joinsNormalized: Slower reads (joins); Denormalized: Data duplication, update anomalies
Single-Leader vs Multi-Leader ReplicationSingle-leader: Simple, consistent; Multi-leader: Write availability in each regionSingle-leader: Failover latency, single write bottleneck; Multi-leader: Conflict resolution complexity
Sharding Early vs LateEarly: Prepared for growth, avoids emergency migration; Late: Simpler architecture longerEarly: Unnecessary complexity if you never need it; Late: Painful migration under load