Open Source
10 items
10 items
The consensus-based configuration store that powers Kubernetes, providing strong consistency guarantees
etcd is a distributed key-value store that provides a reliable way to store data across a cluster of machines. It uses the Raft consensus algorithm to ensure strong consistency - every read returns the most recent write. etcd is the backbone of Kubernetes, storing all cluster state including pod definitions, secrets, and config maps. Its watch mechanism enables reactive systems where components respond immediately to configuration changes.
Every write goes through the Raft consensus protocol, ensuring all nodes agree on the order of operations. Reads can be served from any node with linearizable guarantees using the ReadIndex protocol.
etcd maintains multiple versions of each key using Multi-Version Concurrency Control. Every modification increments a global revision number. You can read historical values, compare-and-swap on specific revisions, and compact old data.
Clients can watch keys or prefixes and receive streaming updates when values change. This enables reactive architectures where components respond to configuration changes without polling.
etcd (pronounced et-see-dee, from /etc distributed) was created at CoreOS in 2013 to solve a fundamental problem: how do you store configuration that multiple machines need to agree on?
Traditional approaches fail in distributed systems: - Config files: No synchronization, manual updates, drift between machines - Databases: Often not designed for configuration workloads, may sacrifice consistency - ZooKeeper: Complex, Java-based, different API model
etcd provides: - Strong consistency: Linearizable reads and writes via Raft - High availability: Continues operating with (n/2)+1 nodes alive - Simple API: HTTP/gRPC with JSON, easy to debug - Watch mechanism: Real-time notifications of changes - Small footprint: Single binary, no dependencies
What Kubernetes stores in etcd: - Pod definitions and status - Service endpoints - ConfigMaps and Secrets - RBAC policies - Custom Resource Definitions - Lease objects for leader election
Key design principle: etcd is the single source of truth. The API server is the only component that talks to etcd directly. All other components communicate through the API server.
etcd supports mini-transactions with if-then-else semantics. You can atomically check conditions and perform different operations based on the result - enabling compare-and-swap, distributed locks, and atomic updates.
Keys are organized in a flat namespace but conventionally use / as separators (like /registry/pods/default/nginx). Range queries on prefixes enable directory-like operations.
etcd (pronounced et-see-dee, from /etc distributed) was created at CoreOS in 2013 to solve a fundamental problem: how do you store configuration that multiple machines need to agree on?
Traditional approaches fail in distributed systems: - Config files: No synchronization, manual updates, drift between machines - Databases: Often not designed for configuration workloads, may sacrifice consistency - ZooKeeper: Complex, Java-based, different API model
etcd provides: - Strong consistency: Linearizable reads and writes via Raft - High availability: Continues operating with (n/2)+1 nodes alive - Simple API: HTTP/gRPC with JSON, easy to debug - Watch mechanism: Real-time notifications of changes - Small footprint: Single binary, no dependencies
What Kubernetes stores in etcd: - Pod definitions and status - Service endpoints - ConfigMaps and Secrets - RBAC policies - Custom Resource Definitions - Lease objects for leader election
Key design principle: etcd is the single source of truth. The API server is the only component that talks to etcd directly. All other components communicate through the API server.
| Aspect | Advantage | Disadvantage |
|---|---|---|
| Strong consistency (linearizable) | Every read returns the most recent write. Safe for coordination and configuration. | Higher latency than eventually consistent systems. All writes go through leader. |
| Raft consensus | Well-understood protocol. Proven correct. Automatic leader election and recovery. | Requires majority for writes. 3-node cluster cannot tolerate 2 failures simultaneously. |
| MVCC with revisions | Historical queries, reliable watches, compare-and-swap transactions. | Storage grows with history. Requires compaction. Deleted keys still consume space until compacted. |
| Watch mechanism | Real-time notifications enable reactive architectures. No polling needed. | Many watches consume memory. Compaction can invalidate watch positions. |
| Simple key-value model | Easy to understand and use. Hierarchical keys with prefix queries. | No secondary indexes. No complex queries. Must design key schema carefully. |
| Single cluster design | Simple operations. Strong consistency within cluster. | No built-in multi-region replication. Cross-datacenter latency affects all writes. |