SystemExpertsSystemExperts
Pricing

Open Source

10 items

Redis: In-Memory Data Structure Store

45mintermediate

Apache Kafka: Distributed Event Streaming Platform

40mintermediate

Kubernetes: Container Orchestration Platform

50mintermediate

Nginx: High-Performance Web Server and Reverse Proxy

40mintermediate

PostgreSQL: The World's Most Advanced Open Source Database

50madvanced

Apache Cassandra: Distributed Wide-Column Store

40madvanced

etcd: Distributed Reliable Key-Value Store

35madvanced

Apache ZooKeeper: Distributed Coordination Service

40madvanced

Envoy Proxy: Modern L7 Proxy and Communication Bus

40madvanced

Apache Hadoop: Distributed Storage and Processing

50madvanced
etcdconsensusraftkey-valuedistributed-systemskubernetesconfigurationadvanced

etcd: Distributed Reliable Key-Value Store

The consensus-based configuration store that powers Kubernetes, providing strong consistency guarantees

Go|47,000 stars|Updated January 2024|35 min read
View on GitHub

Summary

etcd is a distributed key-value store that provides a reliable way to store data across a cluster of machines. It uses the Raft consensus algorithm to ensure strong consistency - every read returns the most recent write. etcd is the backbone of Kubernetes, storing all cluster state including pod definitions, secrets, and config maps. Its watch mechanism enables reactive systems where components respond immediately to configuration changes.

Key Takeaways

Strong Consistency via Raft

Every write goes through the Raft consensus protocol, ensuring all nodes agree on the order of operations. Reads can be served from any node with linearizable guarantees using the ReadIndex protocol.

MVCC with Revisions

etcd maintains multiple versions of each key using Multi-Version Concurrency Control. Every modification increments a global revision number. You can read historical values, compare-and-swap on specific revisions, and compact old data.

Watch for Real-Time Updates

Clients can watch keys or prefixes and receive streaming updates when values change. This enables reactive architectures where components respond to configuration changes without polling.

etcd (pronounced et-see-dee, from /etc distributed) was created at CoreOS in 2013 to solve a fundamental problem: how do you store configuration that multiple machines need to agree on?

Traditional approaches fail in distributed systems: - Config files: No synchronization, manual updates, drift between machines - Databases: Often not designed for configuration workloads, may sacrifice consistency - ZooKeeper: Complex, Java-based, different API model

etcd provides: - Strong consistency: Linearizable reads and writes via Raft - High availability: Continues operating with (n/2)+1 nodes alive - Simple API: HTTP/gRPC with JSON, easy to debug - Watch mechanism: Real-time notifications of changes - Small footprint: Single binary, no dependencies

etcd in the Kubernetes Architecture

What Kubernetes stores in etcd: - Pod definitions and status - Service endpoints - ConfigMaps and Secrets - RBAC policies - Custom Resource Definitions - Lease objects for leader election

Key design principle: etcd is the single source of truth. The API server is the only component that talks to etcd directly. All other components communicate through the API server.

Summary

etcd is a distributed key-value store that provides a reliable way to store data across a cluster of machines. It uses the Raft consensus algorithm to ensure strong consistency - every read returns the most recent write. etcd is the backbone of Kubernetes, storing all cluster state including pod definitions, secrets, and config maps. Its watch mechanism enables reactive systems where components respond immediately to configuration changes.

Key Takeaways

Strong Consistency via Raft

Every write goes through the Raft consensus protocol, ensuring all nodes agree on the order of operations. Reads can be served from any node with linearizable guarantees using the ReadIndex protocol.

MVCC with Revisions

etcd maintains multiple versions of each key using Multi-Version Concurrency Control. Every modification increments a global revision number. You can read historical values, compare-and-swap on specific revisions, and compact old data.

Watch for Real-Time Updates

Clients can watch keys or prefixes and receive streaming updates when values change. This enables reactive architectures where components respond to configuration changes without polling.

Lease-Based TTLs

Keys can be attached to leases with time-to-live. When a lease expires (or is not renewed), all attached keys are deleted. This powers service discovery, distributed locks, and leader election.

Premium Content

Sign in to access this content or upgrade for full access.

Transactional Operations

etcd supports mini-transactions with if-then-else semantics. You can atomically check conditions and perform different operations based on the result - enabling compare-and-swap, distributed locks, and atomic updates.

Hierarchical Key Space

Keys are organized in a flat namespace but conventionally use / as separators (like /registry/pods/default/nginx). Range queries on prefixes enable directory-like operations.

Deep Dive

etcd (pronounced et-see-dee, from /etc distributed) was created at CoreOS in 2013 to solve a fundamental problem: how do you store configuration that multiple machines need to agree on?

Traditional approaches fail in distributed systems: - Config files: No synchronization, manual updates, drift between machines - Databases: Often not designed for configuration workloads, may sacrifice consistency - ZooKeeper: Complex, Java-based, different API model

etcd provides: - Strong consistency: Linearizable reads and writes via Raft - High availability: Continues operating with (n/2)+1 nodes alive - Simple API: HTTP/gRPC with JSON, easy to debug - Watch mechanism: Real-time notifications of changes - Small footprint: Single binary, no dependencies

etcd in the Kubernetes Architecture

What Kubernetes stores in etcd: - Pod definitions and status - Service endpoints - ConfigMaps and Secrets - RBAC policies - Custom Resource Definitions - Lease objects for leader election

Key design principle: etcd is the single source of truth. The API server is the only component that talks to etcd directly. All other components communicate through the API server.

Trade-offs

AspectAdvantageDisadvantage
Strong consistency (linearizable)Every read returns the most recent write. Safe for coordination and configuration.Higher latency than eventually consistent systems. All writes go through leader.
Raft consensusWell-understood protocol. Proven correct. Automatic leader election and recovery.Requires majority for writes. 3-node cluster cannot tolerate 2 failures simultaneously.
MVCC with revisionsHistorical queries, reliable watches, compare-and-swap transactions.Storage grows with history. Requires compaction. Deleted keys still consume space until compacted.
Watch mechanismReal-time notifications enable reactive architectures. No polling needed.Many watches consume memory. Compaction can invalidate watch positions.
Simple key-value modelEasy to understand and use. Hierarchical keys with prefix queries.No secondary indexes. No complex queries. Must design key schema carefully.
Single cluster designSimple operations. Strong consistency within cluster.No built-in multi-region replication. Cross-datacenter latency affects all writes.