Open Source

10 items

postgresqldatabasesqlrelationalmvccacidreplicationadvanced

PostgreSQL: The World's Most Advanced Open Source Database

The relational database that combines SQL standards compliance with extensibility, powering Instagram, Spotify, and the US Federal Aviation Administration

C|15,000 stars|Updated January 2024|50 min read

View on GitHub

Summary

PostgreSQL is an object-relational database that has evolved over 35+ years from an academic project into the most feature-rich open source database. It combines rock-solid ACID compliance with advanced features like JSON support, full-text search, and extensibility through custom types and functions. The key architectural insight: MVCC (Multi-Version Concurrency Control) allows readers and writers to never block each other, enabling high concurrency without sacrificing consistency.

Key Takeaways

MVCC: Readers Never Block Writers

PostgreSQL uses Multi-Version Concurrency Control to maintain multiple versions of rows. Each transaction sees a snapshot of the database, so readers never wait for writers and writers never wait for readers. This is the foundation of PostgreSQL's high concurrency.

Write-Ahead Logging (WAL)

Every change is first written to a sequential log before modifying data pages. This provides durability (crash recovery), enables point-in-time recovery, and powers streaming replication. WAL is why PostgreSQL can guarantee your data survives crashes.

Extensibility as a Core Principle

PostgreSQL lets you add custom data types, operators, index methods, and procedural languages. Extensions like PostGIS (geospatial), TimescaleDB (time-series), and pgvector (AI embeddings) transform PostgreSQL into specialized databases.

PostgreSQL began in 1986 at UC Berkeley as POSTGRES (Post-Ingres), led by Professor Michael Stonebraker. The goal was to build a database that could handle complex data types and relationships that the previous generation (Ingres) could not.

Key milestones: - 1986: POSTGRES project starts at Berkeley - 1996: Renamed to PostgreSQL, SQL support added - 2005: Native Windows support - 2010: Streaming replication - 2016: Parallel query execution - 2022: 64-bit transaction IDs (no more wraparound panic)

Today, PostgreSQL is the default choice for new applications that need a relational database. It powers:

Instagram: Stores user data, feeds, relationships
Spotify: Playlist and user metadata
Apple: Parts of iCloud infrastructure
The FAA: Flight data systems
Heroku, Supabase, Neon: As the core database platform

Why choose PostgreSQL?

Standards compliance: Most complete SQL implementation
Data integrity: Strict ACID, constraints, foreign keys
Extensibility: Add custom types, functions, index methods
JSON support: JSONB with indexing - best of both worlds
Advanced features: CTEs, window functions, full-text search
No licensing costs: True open source (PostgreSQL License)
Ecosystem: Extensions for everything (PostGIS, TimescaleDB, pgvector)

Summary

Key Takeaways

MVCC: Readers Never Block Writers

Write-Ahead Logging (WAL)

Extensibility as a Core Principle

Cost-Based Query Optimizer

The query planner evaluates multiple execution strategies and picks the one with lowest estimated cost. It considers table statistics, index availability, join methods, and parallelism. Understanding EXPLAIN output is essential for query tuning.

Premium Content

Trade-offs

Aspect	Advantage	Disadvantage
Process-per-connection	Strong isolation - one bad query cannot crash others, simple programming model	Memory overhead per connection, requires connection pooling for many connections
MVCC with heap storage	Readers never block writers, excellent read concurrency, snapshot isolation	Dead tuples accumulate, requires VACUUM, table bloat possible
Write-Ahead Logging	Crash recovery, point-in-time recovery, streaming replication all from one mechanism	Write amplification (data written twice), WAL can grow large between checkpoints
Cost-based optimizer	Automatically finds good query plans, adapts to data distribution	Requires up-to-date statistics (ANALYZE), can make wrong choices with bad stats
Synchronous replication	Zero data loss guarantee, strong durability	Commit latency includes network round-trip, replica failure can block writes
Rich type system and extensibility	Native support for JSON, arrays, geometry, custom types, powerful extensions	Learning curve, can lead to over-engineering, some extensions have maintenance burden
Single-primary architecture	Simple consistency model, no write conflicts, easier to reason about	Write scalability limited to one node, failover requires promotion

Open Source

Redis: In-Memory Data Structure Store

Apache Kafka: Distributed Event Streaming Platform

Kubernetes: Container Orchestration Platform

Nginx: High-Performance Web Server and Reverse Proxy