SystemExpertsSystemExperts
Pricing

Open Source

10 items

Redis: In-Memory Data Structure Store

45mintermediate

Apache Kafka: Distributed Event Streaming Platform

40mintermediate

Kubernetes: Container Orchestration Platform

50mintermediate

Nginx: High-Performance Web Server and Reverse Proxy

40mintermediate

PostgreSQL: The World's Most Advanced Open Source Database

50madvanced

Apache Cassandra: Distributed Wide-Column Store

40madvanced

etcd: Distributed Reliable Key-Value Store

35madvanced

Apache ZooKeeper: Distributed Coordination Service

40madvanced

Envoy Proxy: Modern L7 Proxy and Communication Bus

40madvanced

Apache Hadoop: Distributed Storage and Processing

50madvanced
postgresqldatabasesqlrelationalmvccacidreplicationadvanced

PostgreSQL: The World's Most Advanced Open Source Database

The relational database that combines SQL standards compliance with extensibility, powering Instagram, Spotify, and the US Federal Aviation Administration

C|15,000 stars|Updated January 2024|50 min read
View on GitHub

Summary

PostgreSQL is an object-relational database that has evolved over 35+ years from an academic project into the most feature-rich open source database. It combines rock-solid ACID compliance with advanced features like JSON support, full-text search, and extensibility through custom types and functions. The key architectural insight: MVCC (Multi-Version Concurrency Control) allows readers and writers to never block each other, enabling high concurrency without sacrificing consistency.

Key Takeaways

MVCC: Readers Never Block Writers

PostgreSQL uses Multi-Version Concurrency Control to maintain multiple versions of rows. Each transaction sees a snapshot of the database, so readers never wait for writers and writers never wait for readers. This is the foundation of PostgreSQL's high concurrency.

Write-Ahead Logging (WAL)

Every change is first written to a sequential log before modifying data pages. This provides durability (crash recovery), enables point-in-time recovery, and powers streaming replication. WAL is why PostgreSQL can guarantee your data survives crashes.

Extensibility as a Core Principle

PostgreSQL lets you add custom data types, operators, index methods, and procedural languages. Extensions like PostGIS (geospatial), TimescaleDB (time-series), and pgvector (AI embeddings) transform PostgreSQL into specialized databases.

PostgreSQL began in 1986 at UC Berkeley as POSTGRES (Post-Ingres), led by Professor Michael Stonebraker. The goal was to build a database that could handle complex data types and relationships that the previous generation (Ingres) could not.

Key milestones: - 1986: POSTGRES project starts at Berkeley - 1996: Renamed to PostgreSQL, SQL support added - 2005: Native Windows support - 2010: Streaming replication - 2016: Parallel query execution - 2022: 64-bit transaction IDs (no more wraparound panic)

Today, PostgreSQL is the default choice for new applications that need a relational database. It powers:

  • Instagram: Stores user data, feeds, relationships
  • Spotify: Playlist and user metadata
  • Apple: Parts of iCloud infrastructure
  • The FAA: Flight data systems
  • Heroku, Supabase, Neon: As the core database platform

Why choose PostgreSQL?

  1. Standards compliance: Most complete SQL implementation
  2. Data integrity: Strict ACID, constraints, foreign keys
  3. Extensibility: Add custom types, functions, index methods
  4. JSON support: JSONB with indexing - best of both worlds
  5. Advanced features: CTEs, window functions, full-text search
  6. No licensing costs: True open source (PostgreSQL License)
  7. Ecosystem: Extensions for everything (PostGIS, TimescaleDB, pgvector)

Summary

PostgreSQL is an object-relational database that has evolved over 35+ years from an academic project into the most feature-rich open source database. It combines rock-solid ACID compliance with advanced features like JSON support, full-text search, and extensibility through custom types and functions. The key architectural insight: MVCC (Multi-Version Concurrency Control) allows readers and writers to never block each other, enabling high concurrency without sacrificing consistency.

Key Takeaways

MVCC: Readers Never Block Writers

PostgreSQL uses Multi-Version Concurrency Control to maintain multiple versions of rows. Each transaction sees a snapshot of the database, so readers never wait for writers and writers never wait for readers. This is the foundation of PostgreSQL's high concurrency.

Write-Ahead Logging (WAL)

Every change is first written to a sequential log before modifying data pages. This provides durability (crash recovery), enables point-in-time recovery, and powers streaming replication. WAL is why PostgreSQL can guarantee your data survives crashes.

Extensibility as a Core Principle

PostgreSQL lets you add custom data types, operators, index methods, and procedural languages. Extensions like PostGIS (geospatial), TimescaleDB (time-series), and pgvector (AI embeddings) transform PostgreSQL into specialized databases.

Cost-Based Query Optimizer

The query planner evaluates multiple execution strategies and picks the one with lowest estimated cost. It considers table statistics, index availability, join methods, and parallelism. Understanding EXPLAIN output is essential for query tuning.

Premium Content

Sign in to access this content or upgrade for full access.

Process-Per-Connection Model

Each client connection gets a dedicated backend process. This provides isolation (one bad query can't crash others) but means connection pooling is essential for applications with many short-lived connections.

Rich Type System with JSON Support

Beyond standard SQL types, PostgreSQL offers arrays, ranges, geometric types, and JSONB (binary JSON with indexing). You can query JSON documents with SQL, bridging relational and document models in one database.

Deep Dive

PostgreSQL began in 1986 at UC Berkeley as POSTGRES (Post-Ingres), led by Professor Michael Stonebraker. The goal was to build a database that could handle complex data types and relationships that the previous generation (Ingres) could not.

Key milestones: - 1986: POSTGRES project starts at Berkeley - 1996: Renamed to PostgreSQL, SQL support added - 2005: Native Windows support - 2010: Streaming replication - 2016: Parallel query execution - 2022: 64-bit transaction IDs (no more wraparound panic)

Today, PostgreSQL is the default choice for new applications that need a relational database. It powers:

  • Instagram: Stores user data, feeds, relationships
  • Spotify: Playlist and user metadata
  • Apple: Parts of iCloud infrastructure
  • The FAA: Flight data systems
  • Heroku, Supabase, Neon: As the core database platform

Why choose PostgreSQL?

  1. Standards compliance: Most complete SQL implementation
  2. Data integrity: Strict ACID, constraints, foreign keys
  3. Extensibility: Add custom types, functions, index methods
  4. JSON support: JSONB with indexing - best of both worlds
  5. Advanced features: CTEs, window functions, full-text search
  6. No licensing costs: True open source (PostgreSQL License)
  7. Ecosystem: Extensions for everything (PostGIS, TimescaleDB, pgvector)

Trade-offs

AspectAdvantageDisadvantage
Process-per-connectionStrong isolation - one bad query cannot crash others, simple programming modelMemory overhead per connection, requires connection pooling for many connections
MVCC with heap storageReaders never block writers, excellent read concurrency, snapshot isolationDead tuples accumulate, requires VACUUM, table bloat possible
Write-Ahead LoggingCrash recovery, point-in-time recovery, streaming replication all from one mechanismWrite amplification (data written twice), WAL can grow large between checkpoints
Cost-based optimizerAutomatically finds good query plans, adapts to data distributionRequires up-to-date statistics (ANALYZE), can make wrong choices with bad stats
Synchronous replicationZero data loss guarantee, strong durabilityCommit latency includes network round-trip, replica failure can block writes
Rich type system and extensibilityNative support for JSON, arrays, geometry, custom types, powerful extensionsLearning curve, can lead to over-engineering, some extensions have maintenance burden
Single-primary architectureSimple consistency model, no write conflicts, easier to reason aboutWrite scalability limited to one node, failover requires promotion