Open Source

10 items

cassandranosqldistributed-databasewide-columntime-serieshigh-availabilityadvanced

Apache Cassandra: Distributed Wide-Column Store

The database that handles millions of writes per second for Netflix, Apple, and Discord with no single point of failure

Java|8,500 stars|Updated January 2024|40 min read

View on GitHub

Summary

Cassandra is a distributed NoSQL database designed for massive write throughput and linear scalability. It uses a masterless ring architecture where every node is equal, eliminating single points of failure. Data is partitioned across nodes using consistent hashing and replicated for fault tolerance. Cassandra trades consistency for availability - it offers tunable consistency levels, letting you choose between strong consistency and high availability per query.

Key Takeaways

Masterless Architecture

Every node in Cassandra is identical. There is no master, no leader election, no single point of failure. Any node can accept reads and writes for any data. This simplifies operations and provides true high availability.

Tunable Consistency

Cassandra lets you choose consistency per query. Write with QUORUM for durability, read with ONE for speed, or use ALL for strong consistency. This flexibility lets you optimize for your specific use case.

Log-Structured Storage

Writes go to an append-only commit log and memtable (in-memory). Memtables flush to immutable SSTables on disk. This write path is extremely fast - sequential I/O only, no random seeks.

Partition-Based Data Model

Data is organized by partition key, which determines node placement. Within a partition, rows are sorted by clustering columns. This model excels at time-series data and wide rows with millions of columns.

Gossip Protocol for Cluster State

Nodes share cluster state using gossip - each node periodically exchanges state with random peers. Within seconds, all nodes learn about membership changes, failures, and schema updates.

Compaction for Read Performance

Background compaction merges SSTables, removing tombstones and consolidating data. This trades write amplification for read performance - fewer SSTables means fewer disk seeks per query.

Deep Dive

Cassandra was created at Facebook in 2007 to solve their inbox search problem - they needed to handle massive write loads across multiple data centers with no downtime. The project combined ideas from two influential papers:

Amazon Dynamo: Masterless architecture, consistent hashing, tunable consistency
Google Bigtable: Column-family data model, SSTable storage, memtables

The result is a database optimized for:

High write throughput: Millions of writes per second
Linear scalability: Add nodes to increase capacity
Always-on availability: No single point of failure
Multi-datacenter replication: Built-in geographic distribution

When to use Cassandra:

Time-series data (metrics, IoT, logs)
User activity tracking and personalization
Messaging and chat systems
Product catalogs with heavy read/write
Any workload needing 99.99% uptime

When NOT to use Cassandra:

Heavy aggregations and analytics (use ClickHouse)
Complex joins and transactions (use PostgreSQL)
Small datasets under 100GB (overkill)
Ad-hoc query patterns (Cassandra needs known access patterns)

Trade-offs

Aspect	Advantage	Disadvantage
Masterless Architecture	No single point of failure, any node can serve any request, simplified operations	Coordination overhead for consistency, all nodes must agree on schema changes
Eventual Consistency Default	High availability, low latency writes, works across datacenters with high latency	Application must handle stale reads, conflict resolution via last-write-wins
Log-Structured Storage	Extremely fast writes (sequential I/O only), no read-before-write	Read amplification (check multiple SSTables), space amplification during compaction
Partition-Based Model	Predictable performance, data locality, efficient range queries within partition	Must know queries upfront, no ad-hoc joins, denormalization required
Tunable Consistency	Flexibility to choose consistency per query, optimize for specific use cases	Complexity in choosing right level, easy to misconfigure and get inconsistent reads
Wide-Column Model	Excellent for time-series, sparse data, dynamic columns without schema changes	Not suitable for complex relationships, aggregations require full scans
Multi-Datacenter Replication	Built-in geo-distribution, disaster recovery, serve users from nearest DC	Cross-DC consistency is expensive, operational complexity increases
JVM-Based	Rich ecosystem, good tooling, mature garbage collectors	GC pauses can cause latency spikes, requires JVM tuning expertise

Open Source

Redis: In-Memory Data Structure Store

Apache Kafka: Distributed Event Streaming Platform

Kubernetes: Container Orchestration Platform

Nginx: High-Performance Web Server and Reverse Proxy

PostgreSQL: The World's Most Advanced Open Source Database