System Design Fundamentals

11 items

Fundamentalsdatabasessqlnosqlshardingreplicationindexingfundamentalssystem-designadvanced

Database Fundamentals

From ACID to sharding - understanding data storage trade-offs

Foundation knowledge|40 min read

Summary

Databases are the backbone of most systems. Relational databases (PostgreSQL, MySQL) provide ACID guarantees and powerful querying but can be harder to scale horizontally. NoSQL databases (MongoDB, Cassandra, DynamoDB) trade some guarantees for scalability and flexibility. Understanding replication (single-leader, multi-leader, leaderless), partitioning (hash, range), and indexing (B-tree, LSM-tree) helps you choose and optimize the right database for your workload.

Key Takeaways

ACID Guarantees Have Real Costs

Atomicity, Consistency, Isolation, Durability require coordination and disk syncs. Each guarantee adds latency and reduces throughput. Know which guarantees you actually need—many workloads don't require serializable isolation.

Indexes Speed Reads but Slow Writes

Every index must be updated on every write. A table with 5 indexes does 5x the write work. Index what you query, remove what you don't. The right indexes can make queries 1000x faster; wrong indexes waste resources.

Replication is for Availability, Sharding is for Scale

Replication copies all data to multiple nodes (read scaling, failover). Sharding splits data across nodes (write scaling, storage scaling). Most systems need replication; fewer need sharding.

SQL vs NoSQL is About Data Model, Not Scale

Both can scale. SQL offers joins, transactions, and schema enforcement. NoSQL offers flexible schemas and specific access patterns (documents, key-value, wide-column). Choose based on your data relationships and query patterns.

Denormalization Trades Storage for Read Performance

Normalized data avoids duplication but requires joins. Denormalized data duplicates but enables single-table queries. At scale, denormalization often wins because joins across shards are expensive or impossible.

The N+1 Query Problem is Everywhere

Fetching a list then querying each item individually creates O(N) database calls. This is the most common performance bug. Solutions: eager loading (JOINs), batch queries, or DataLoader patterns.

Deep Dive

ACID guarantees provide reliability for database transactions:

Atomicity: Transaction is all-or-nothing. If any part fails, the entire transaction is rolled back.

Consistency: Transaction brings database from one valid state to another. All constraints are satisfied.

Isolation: Concurrent transactions don't interfere with each other. Each appears to execute alone.

Durability: Once committed, data survives crashes. Usually means written to disk.

ACID Transaction Example

Isolation levels (from weakest to strongest):

| Level | Dirty Reads | Non-Repeatable Reads | Phantom Reads | Performance | |-------|-------------|----------------------|---------------|-------------| | Read Uncommitted | Possible | Possible | Possible | Fastest | | Read Committed | Prevented | Possible | Possible | Fast | | Repeatable Read | Prevented | Prevented | Possible | Medium | | Serializable | Prevented | Prevented | Prevented | Slowest |

Most databases default to Read Committed. Serializable is rarely needed and significantly impacts performance.

sql
-- Set isolation level (PostgreSQL)
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

-- Check current level
SHOW transaction_isolation;

Trade-offs

Aspect	Advantage	Disadvantage
SQL vs NoSQL	SQL: ACID, JOINs, mature tooling; NoSQL: Flexible schema, horizontal scaling, specific access patterns	SQL: Harder to scale writes, schema migrations; NoSQL: Limited transactions, eventual consistency
Normalization vs Denormalization	Normalized: No data duplication, easier updates; Denormalized: Faster reads, no joins	Normalized: Slower reads (joins); Denormalized: Data duplication, update anomalies
Single-Leader vs Multi-Leader Replication	Single-leader: Simple, consistent; Multi-leader: Write availability in each region	Single-leader: Failover latency, single write bottleneck; Multi-leader: Conflict resolution complexity
Sharding Early vs Late	Early: Prepared for growth, avoids emergency migration; Late: Simpler architecture longer	Early: Unnecessary complexity if you never need it; Late: Painful migration under load

System Design Fundamentals

Scalability Fundamentals

Latency, Throughput & Performance

Back-of-Envelope Calculations

Availability & Reliability Fundamentals

CAP Theorem & Consistency Models

Load Balancing Deep Dive

Asynchronous Processing & Message Queues

Networking & Protocols

Caching Strategies

System Design Fundamentals