System Design Fundamentals

11 items

Fundamentalscachingredismemcachedcdncache-invalidationfundamentalssystem-designintermediate

Caching Strategies

From browser to database - a complete guide to caching at every layer

Foundation knowledge|35 min read

Summary

Caching stores copies of frequently accessed data in faster storage to reduce latency and load on origin systems. Caches exist at every layer: browser, CDN, reverse proxy, application, and database. The key challenge is cache invalidation—ensuring cached data stays fresh. Different caching patterns (cache-aside, read-through, write-through, write-behind) offer different consistency-performance trade-offs. Understanding cache eviction policies (LRU, LFU, TTL) and problems like thundering herd and cache stampede is essential for building reliable cached systems.

Key Takeaways

The Two Hard Problems: Naming and Cache Invalidation

Knowing when cached data is stale is genuinely difficult. Too aggressive invalidation defeats the purpose of caching. Too lazy invalidation serves stale data. There's no universal solution—each use case needs its own strategy.

Cache-Aside is the Most Common Pattern

Application checks cache first, falls back to database, then populates cache. It's explicit, flexible, and lets you cache exactly what you need. Most web applications use this pattern with Redis or Memcached.

Write-Through Prevents Stale Reads at the Cost of Write Latency

Writing to cache and database together ensures cache is always fresh. But every write now has cache latency added. Use when read consistency matters more than write speed.

Caching stores copies of data in faster storage to reduce latency and origin load.

The speed hierarchy (latency): - L1 CPU cache: 1 ns - L2 CPU cache: 4 ns - RAM: 100 ns - Redis (local): 0.5 ms - Redis (remote): 1-5 ms - SSD: 0.1 ms - Database query: 1-100 ms - Network API call: 50-500 ms

Caching exploits this hierarchy—store frequently accessed data in faster tiers.

When to cache: - Data is expensive to compute/fetch - Data is accessed frequently - Data doesn't change too often - Stale data is acceptable (briefly)

Caching Reduces Load and Latency

Cache metrics:

Hit rate: Percentage of requests served from cache
Miss rate: Percentage requiring origin fetch
Hit ratio: hits / (hits + misses)
Latency reduction: (origin_latency - cache_latency) × hit_rate

Example calculation:

Origin latency: 100ms
Cache latency: 2ms
Hit rate: 90%

Avg latency = (0.9 × 2ms) + (0.1 × 100ms) = 11.8ms
88% latency reduction!

Summary

Key Takeaways

The Two Hard Problems: Naming and Cache Invalidation

Cache-Aside is the Most Common Pattern

Write-Through Prevents Stale Reads at the Cost of Write Latency

Writing to cache and database together ensures cache is always fresh. But every write now has cache latency added. Use when read consistency matters more than write speed.

The Thundering Herd Can Take Down Your Database

When a popular cache key expires, hundreds of requests simultaneously hit the database. Solutions: staggered TTLs, probabilistic early expiration, request coalescing, or cache locks. This is a real production incident pattern.

CDNs Are Geographically Distributed Caches

CDNs cache static content at edge locations worldwide. A user in Tokyo gets content from a Tokyo edge server, not your US origin. This reduces latency by 100-200ms for global users. CDN invalidation is its own challenge.

Not Everything Should Be Cached

Caching adds complexity. If data changes frequently, is rarely accessed, or must always be fresh, caching may hurt more than help. Cache the 20% of data that serves 80% of requests.

Deep Dive

Caching stores copies of data in faster storage to reduce latency and origin load.

Caching exploits this hierarchy—store frequently accessed data in faster tiers.

When to cache: - Data is expensive to compute/fetch - Data is accessed frequently - Data doesn't change too often - Stale data is acceptable (briefly)

Caching Reduces Load and Latency

Cache metrics:

Hit rate: Percentage of requests served from cache
Miss rate: Percentage requiring origin fetch
Hit ratio: hits / (hits + misses)
Latency reduction: (origin_latency - cache_latency) × hit_rate

Example calculation:

Origin latency: 100ms
Cache latency: 2ms
Hit rate: 90%

Avg latency = (0.9 × 2ms) + (0.1 × 100ms) = 11.8ms
88% latency reduction!

Trade-offs

Aspect	Advantage	Disadvantage
TTL-Based vs Event-Based Invalidation	TTL is simple and requires no infrastructure; Event-based ensures immediate freshness	TTL serves stale data until expiry; Event-based requires pub/sub infrastructure and is complex
Write-Through vs Write-Behind	Write-through ensures cache consistency; Write-behind minimizes write latency	Write-through adds latency to writes; Write-behind risks data loss if cache fails
Local vs Distributed Cache	Local cache has zero network latency; Distributed cache is shared across instances	Local cache requires cross-instance invalidation; Distributed cache adds network round trip
Aggressive vs Conservative Caching	Aggressive caching maximizes performance; Conservative caching ensures freshness	Aggressive caching risks stale data; Conservative caching reduces cache benefits

System Design Fundamentals

Scalability Fundamentals

Latency, Throughput & Performance

Back-of-Envelope Calculations

Availability & Reliability Fundamentals

CAP Theorem & Consistency Models

Load Balancing Deep Dive

Asynchronous Processing & Message Queues

Networking & Protocols