Open Source

10 items

etcdconsensusraftkey-valuedistributed-systemskubernetesconfigurationadvanced

etcd: Distributed Reliable Key-Value Store

The consensus-based configuration store that powers Kubernetes, providing strong consistency guarantees

Go|47,000 stars|Updated January 2024|35 min read

View on GitHub

Summary

etcd is a distributed key-value store that provides a reliable way to store data across a cluster of machines. It uses the Raft consensus algorithm to ensure strong consistency - every read returns the most recent write. etcd is the backbone of Kubernetes, storing all cluster state including pod definitions, secrets, and config maps. Its watch mechanism enables reactive systems where components respond immediately to configuration changes.

Key Takeaways

Strong Consistency via Raft

Every write goes through the Raft consensus protocol, ensuring all nodes agree on the order of operations. Reads can be served from any node with linearizable guarantees using the ReadIndex protocol.

MVCC with Revisions

etcd maintains multiple versions of each key using Multi-Version Concurrency Control. Every modification increments a global revision number. You can read historical values, compare-and-swap on specific revisions, and compact old data.

Watch for Real-Time Updates

Clients can watch keys or prefixes and receive streaming updates when values change. This enables reactive architectures where components respond to configuration changes without polling.

etcd (pronounced et-see-dee, from /etc distributed) was created at CoreOS in 2013 to solve a fundamental problem: how do you store configuration that multiple machines need to agree on?

Traditional approaches fail in distributed systems: - Config files: No synchronization, manual updates, drift between machines - Databases: Often not designed for configuration workloads, may sacrifice consistency - ZooKeeper: Complex, Java-based, different API model

etcd provides: - Strong consistency: Linearizable reads and writes via Raft - High availability: Continues operating with (n/2)+1 nodes alive - Simple API: HTTP/gRPC with JSON, easy to debug - Watch mechanism: Real-time notifications of changes - Small footprint: Single binary, no dependencies

etcd in the Kubernetes Architecture

What Kubernetes stores in etcd: - Pod definitions and status - Service endpoints - ConfigMaps and Secrets - RBAC policies - Custom Resource Definitions - Lease objects for leader election

Key design principle: etcd is the single source of truth. The API server is the only component that talks to etcd directly. All other components communicate through the API server.

Summary

Key Takeaways

Strong Consistency via Raft

Every write goes through the Raft consensus protocol, ensuring all nodes agree on the order of operations. Reads can be served from any node with linearizable guarantees using the ReadIndex protocol.

MVCC with Revisions

Watch for Real-Time Updates

Clients can watch keys or prefixes and receive streaming updates when values change. This enables reactive architectures where components respond to configuration changes without polling.

Lease-Based TTLs

Keys can be attached to leases with time-to-live. When a lease expires (or is not renewed), all attached keys are deleted. This powers service discovery, distributed locks, and leader election.

Premium Content

Trade-offs

Aspect	Advantage	Disadvantage
Strong consistency (linearizable)	Every read returns the most recent write. Safe for coordination and configuration.	Higher latency than eventually consistent systems. All writes go through leader.
Raft consensus	Well-understood protocol. Proven correct. Automatic leader election and recovery.	Requires majority for writes. 3-node cluster cannot tolerate 2 failures simultaneously.
MVCC with revisions	Historical queries, reliable watches, compare-and-swap transactions.	Storage grows with history. Requires compaction. Deleted keys still consume space until compacted.
Watch mechanism	Real-time notifications enable reactive architectures. No polling needed.	Many watches consume memory. Compaction can invalidate watch positions.
Simple key-value model	Easy to understand and use. Hierarchical keys with prefix queries.	No secondary indexes. No complex queries. Must design key schema carefully.
Single cluster design	Simple operations. Strong consistency within cluster.	No built-in multi-region replication. Cross-datacenter latency affects all writes.

Open Source

Redis: In-Memory Data Structure Store

Apache Kafka: Distributed Event Streaming Platform

Kubernetes: Container Orchestration Platform

Nginx: High-Performance Web Server and Reverse Proxy

PostgreSQL: The World's Most Advanced Open Source Database

Apache Cassandra: Distributed Wide-Column Store