Open Source

10 items

rediscachingkey-valuein-memorydata-structurespub-subdistributed-systemsintermediate

Redis: In-Memory Data Structure Store

The Swiss Army knife of caching, messaging, and real-time data that powers Twitter, GitHub, and Stack Overflow

C|65,000 stars|Updated January 2024|45 min read

View on GitHub

Summary

Redis is an in-memory data structure server that supports strings, hashes, lists, sets, sorted sets, streams, and more. Its single-threaded architecture eliminates locking overhead, achieving millions of operations per second. Beyond caching, Redis serves as a message broker, real-time leaderboard, session store, and rate limiter. The key insight: by keeping data in memory and using efficient data structures, Redis achieves microsecond latency that disk-based databases cannot match.

Key Takeaways

Single-Threaded by Design

Redis processes commands in a single thread, eliminating lock contention and context switching. This makes reasoning about atomicity trivial -each command executes completely before the next begins. The bottleneck is usually network I/O, not CPU.

Data Structures, Not Just Key-Value

Unlike simple caches, Redis provides rich data structures (lists, sets, sorted sets, streams) with O(1) and O(log n) operations. You can LPUSH/RPOP for queues, ZADD/ZRANGE for leaderboards, and XADD/XREAD for event streams -all atomically.

Memory Efficiency Through Encodings

Redis automatically switches between memory-efficient encodings based on data size. Small hashes use ziplists (contiguous memory), large ones use hash tables. This optimization happens transparently, saving 10x memory for small objects.

Redis (Remote Dictionary Server) was created by Salvatore Sanfilippo in 2009 to solve a specific problem: his real-time web analytics startup needed to handle high-velocity writes that MySQL couldn't keep up with.

The core insight was simple: memory is fast, disk is slow. By keeping all data in RAM and using efficient data structures, Redis achieves latency measured in microseconds -1000x faster than disk-based databases.

But Redis isn't just a cache. It's a data structure server. Where Memcached only stores strings, Redis provides:

Strings: Binary-safe, up to 512MB
Lists: Linked lists with O(1) push/pop at both ends
Sets: Unique unordered collections with O(1) membership test
Sorted Sets: Sets ordered by score with O(log n) insertion
Hashes: Field-value maps, like objects
Streams: Append-only logs with consumer groups
HyperLogLog: Probabilistic cardinality estimation
Bitmaps: Bit-level operations on strings
Geospatial indexes: Location-based queries

Redis vs Traditional Database Access Pattern

Common use cases:

Caching: Store expensive query results, API responses, session data
Rate limiting: Track request counts with expiring keys
Leaderboards: Sorted sets for real-time rankings
Pub/Sub: Real-time messaging between services
Queues: Lists as FIFO queues with LPUSH/RPOP
Distributed locks: SET with NX and PX options
Real-time analytics: HyperLogLog for unique visitors

Summary

Key Takeaways

Single-Threaded by Design

Data Structures, Not Just Key-Value

Memory Efficiency Through Encodings

Persistence Without Sacrificing Speed

Redis offers two persistence options: RDB (point-in-time snapshots via fork()) and AOF (append-only log). RDB is fast to load but may lose recent data; AOF is durable but larger. Most production setups use both.

Built-in Replication and Clustering

Redis supports master-replica replication for read scaling and failover. Redis Cluster provides automatic sharding across multiple nodes using hash slots (16384 slots distributed across masters).

Lua Scripting for Atomicity

Complex operations spanning multiple keys can be made atomic using Lua scripts. The script executes in the same single thread, so no other command can interleave. This replaces the need for distributed locks in many cases.

Deep Dive

But Redis isn't just a cache. It's a data structure server. Where Memcached only stores strings, Redis provides:

Strings: Binary-safe, up to 512MB
Lists: Linked lists with O(1) push/pop at both ends
Sets: Unique unordered collections with O(1) membership test
Sorted Sets: Sets ordered by score with O(log n) insertion
Hashes: Field-value maps, like objects
Streams: Append-only logs with consumer groups
HyperLogLog: Probabilistic cardinality estimation
Bitmaps: Bit-level operations on strings
Geospatial indexes: Location-based queries

Redis vs Traditional Database Access Pattern

Common use cases:

Caching: Store expensive query results, API responses, session data
Rate limiting: Track request counts with expiring keys
Leaderboards: Sorted sets for real-time rankings
Pub/Sub: Real-time messaging between services
Queues: Lists as FIFO queues with LPUSH/RPOP
Distributed locks: SET with NX and PX options
Real-time analytics: HyperLogLog for unique visitors

Trade-offs

Aspect	Advantage	Disadvantage
Single-threaded execution	No locks, no race conditions, predictable latency, simpler code	Cannot utilize multiple CPU cores for single operations; one slow command blocks everything
In-memory storage	Microsecond latency, millions of operations per second	Data limited by RAM; more expensive than disk storage; requires persistence strategy
Asynchronous replication	Low latency writes, simple master-replica setup	Potential data loss if master fails before replication; replicas can serve stale reads
Hash slot clustering	Linear horizontal scaling, automatic failover, no central coordinator	Multi-key operations limited to same slot; resharding requires data migration
Lua scripting	Atomic complex operations, reduce round trips, custom commands	Scripts block all other operations; debugging is difficult; must be deterministic
Pub/Sub simplicity	Easy real-time messaging, pattern subscriptions	No persistence -offline subscribers miss messages; no replay capability
RDB persistence	Compact snapshots, fast restart, minimal runtime overhead	Data loss between snapshots; fork() can cause latency spikes with large datasets
AOF persistence	Minimal data loss (1 second with everysec), human-readable log	Larger files than RDB; slower restart; write amplification

Open Source