Patterns

35 items

System Design Pattern

Storagetime-seriesmetricscompressionbucketingdownsamplingintermediate

Time-Series Storage Pattern

Optimized storage for time-stamped data

Used in: InfluxDB, TimescaleDB, Prometheus|20 min read

Summary

Time-series storage is optimized for data points with timestamps, like metrics, sensor readings, and financial data. The key insight is that time-series data has unique properties: it arrives in time order, is immutable once written, queries are usually for recent data and time ranges, and old data can be downsampled or deleted. This enables specialized storage techniques like columnar compression (10-100x), time-based partitioning for fast range queries, downsampling old data, and retention policies. Systems like InfluxDB, Prometheus, and TimescaleDB achieve 10-100x better compression and 100x faster range queries than general-purpose databases.

Key Takeaways

Time as First-Class Dimension

Unlike general databases where time is just another column, time-series databases treat time as a primary index. Data is organized, partitioned, and compressed by time, enabling fast range queries and automatic retention.

Write-Once, Query-Many

Time-series data is immutable - once a metric is recorded at a timestamp, it never changes. This enables aggressive compression and append-only storage without MVCC complexity.

Columnar Compression

Metrics change slowly over time (CPU 45%, 46%, 47%). Storing consecutive values in columnar format enables delta encoding, run-length encoding, and compression ratios of 10-100x.

Consider storing application metrics in PostgreSQL:

Scenario: 1000 servers × 100 metrics per server × 1 sample/second = 100K writes/second

Problems with row-oriented databases:

sql

Why this fails:

Write Amplification: Each insert creates new B-tree node, causes index updates, triggers WAL writes. 100K inserts/sec overwhelms database.

Poor Compression: Row-oriented storage intermixes different metrics. Cannot exploit temporal correlation (CPU 45%, 46%, 47% stored as separate 8-byte doubles).

Inefficient Queries: Range query for 1 hour = 3600 rows. But database scans entire table or uses index that points to scattered disk locations.

No Retention: Data accumulates forever. After 1 year = 3.15 trillion rows. Database slows to crawl.

Wasted Storage: Every row stores server_id and metric_name string (50 bytes) repeatedly. For 8-byte value, metadata is 6x larger than data!

The Core Insight

Time-series data has unique properties that general databases don't exploit: arrives in time order, immutable, high write rate, queries are range scans, old data less valuable. Specialized storage can achieve 10-100x better compression and 100x faster queries.

Summary

Key Takeaways

Time as First-Class Dimension

Write-Once, Query-Many

Time-series data is immutable - once a metric is recorded at a timestamp, it never changes. This enables aggressive compression and append-only storage without MVCC complexity.

Columnar Compression

Metrics change slowly over time (CPU 45%, 46%, 47%). Storing consecutive values in columnar format enables delta encoding, run-length encoding, and compression ratios of 10-100x.

Downsampling Strategy

Recent data needs full resolution (1-second samples), but old data can be aggregated (1-hour averages). This reduces storage by 3600x for historical data while preserving trends.

Time-Based Partitioning

Data is partitioned into chunks by time (daily/hourly). Queries for recent data touch only recent partitions. Old partitions can be archived or deleted based on retention policy.

Query Patterns

90%+ of queries are for recent data (last hour/day) and time ranges, not point lookups. This is opposite of transactional databases, requiring completely different indexes and storage layout.

Pattern Details

Consider storing application metrics in PostgreSQL:

Scenario: 1000 servers × 100 metrics per server × 1 sample/second = 100K writes/second

Problems with row-oriented databases:

sql

Why this fails:

Write Amplification: Each insert creates new B-tree node, causes index updates, triggers WAL writes. 100K inserts/sec overwhelms database.

Poor Compression: Row-oriented storage intermixes different metrics. Cannot exploit temporal correlation (CPU 45%, 46%, 47% stored as separate 8-byte doubles).

Inefficient Queries: Range query for 1 hour = 3600 rows. But database scans entire table or uses index that points to scattered disk locations.

No Retention: Data accumulates forever. After 1 year = 3.15 trillion rows. Database slows to crawl.

Wasted Storage: Every row stores server_id and metric_name string (50 bytes) repeatedly. For 8-byte value, metadata is 6x larger than data!

The Core Insight

Trade-offs

Aspect	Advantage	Disadvantage

Patterns

Horizontal Scaling Pattern

Retry with Backoff Pattern

Queue-based Load Leveling Pattern

Replication Pattern

Caching Strategies Pattern

Fan-out Pattern

Fan-in Pattern

Persistent Connections Pattern

Load Balancing Pattern

Circuit Breaker Pattern

Bloom Filters Pattern