SystemExpertsSystemExperts
Pricing

Patterns

35 items

Horizontal Scaling Pattern

15mbeginner

Retry with Backoff Pattern

15mbeginner

Queue-based Load Leveling Pattern

20mintermediate

Replication Pattern

25mintermediate

Caching Strategies Pattern

25mintermediate

Fan-out Pattern

20mintermediate

Fan-in Pattern

20mintermediate

Persistent Connections Pattern

20mintermediate

Load Balancing Pattern

20mintermediate

Circuit Breaker Pattern

20mintermediate

Bloom Filters Pattern

20mintermediate

Time-Series Storage Pattern

20mintermediate

Bulkhead Pattern

20mintermediate

Batch Processing Pattern

20mintermediate

Write-Ahead Log Pattern

20mintermediate

API Gateway Pattern

20mintermediate

Backend for Frontend Pattern

20mintermediate

Sidecar Pattern

20mintermediate

Idempotency Pattern

20mintermediate

Rate Limiting Pattern

20mintermediate

Backpressure Pattern

20mintermediate

Pub/Sub Pattern

25mintermediate

Eventual Consistency Pattern

25mintermediate

Sharding Pattern

25madvanced

Conflict Resolution Pattern

25madvanced

Strong Consistency Pattern

30madvanced

Leader Election Pattern

25madvanced

Consensus Protocols Pattern

30madvanced

Stream Processing Pattern

25madvanced

Change Data Capture Pattern

25madvanced

Distributed Locking Pattern

25madvanced

Two-Phase Commit Pattern

25madvanced

LSM Trees Pattern

25madvanced

Event Sourcing Pattern

30madvanced

CQRS Pattern

28madvanced
System Design Pattern
Storagetime-seriesmetricscompressionbucketingdownsamplingintermediate

Time-Series Storage Pattern

Optimized storage for time-stamped data

Used in: InfluxDB, TimescaleDB, Prometheus|20 min read

Summary

Time-series storage is optimized for data points with timestamps, like metrics, sensor readings, and financial data. The key insight is that time-series data has unique properties: it arrives in time order, is immutable once written, queries are usually for recent data and time ranges, and old data can be downsampled or deleted. This enables specialized storage techniques like columnar compression (10-100x), time-based partitioning for fast range queries, downsampling old data, and retention policies. Systems like InfluxDB, Prometheus, and TimescaleDB achieve 10-100x better compression and 100x faster range queries than general-purpose databases.

Key Takeaways

Time as First-Class Dimension

Unlike general databases where time is just another column, time-series databases treat time as a primary index. Data is organized, partitioned, and compressed by time, enabling fast range queries and automatic retention.

Write-Once, Query-Many

Time-series data is immutable - once a metric is recorded at a timestamp, it never changes. This enables aggressive compression and append-only storage without MVCC complexity.

Columnar Compression

Metrics change slowly over time (CPU 45%, 46%, 47%). Storing consecutive values in columnar format enables delta encoding, run-length encoding, and compression ratios of 10-100x.

Consider storing application metrics in PostgreSQL:

Scenario: 1000 servers × 100 metrics per server × 1 sample/second = 100K writes/second

Problems with row-oriented databases:

sql

Why this fails:

  1. Write Amplification: Each insert creates new B-tree node, causes index updates, triggers WAL writes. 100K inserts/sec overwhelms database.
  1. Poor Compression: Row-oriented storage intermixes different metrics. Cannot exploit temporal correlation (CPU 45%, 46%, 47% stored as separate 8-byte doubles).
  1. Inefficient Queries: Range query for 1 hour = 3600 rows. But database scans entire table or uses index that points to scattered disk locations.
  1. No Retention: Data accumulates forever. After 1 year = 3.15 trillion rows. Database slows to crawl.
  1. Wasted Storage: Every row stores server_id and metric_name string (50 bytes) repeatedly. For 8-byte value, metadata is 6x larger than data!

The Core Insight

Time-series data has unique properties that general databases don't exploit: arrives in time order, immutable, high write rate, queries are range scans, old data less valuable. Specialized storage can achieve 10-100x better compression and 100x faster queries.

Summary

Time-series storage is optimized for data points with timestamps, like metrics, sensor readings, and financial data. The key insight is that time-series data has unique properties: it arrives in time order, is immutable once written, queries are usually for recent data and time ranges, and old data can be downsampled or deleted. This enables specialized storage techniques like columnar compression (10-100x), time-based partitioning for fast range queries, downsampling old data, and retention policies. Systems like InfluxDB, Prometheus, and TimescaleDB achieve 10-100x better compression and 100x faster range queries than general-purpose databases.

Key Takeaways

Time as First-Class Dimension

Unlike general databases where time is just another column, time-series databases treat time as a primary index. Data is organized, partitioned, and compressed by time, enabling fast range queries and automatic retention.

Write-Once, Query-Many

Time-series data is immutable - once a metric is recorded at a timestamp, it never changes. This enables aggressive compression and append-only storage without MVCC complexity.

Columnar Compression

Metrics change slowly over time (CPU 45%, 46%, 47%). Storing consecutive values in columnar format enables delta encoding, run-length encoding, and compression ratios of 10-100x.

Downsampling Strategy

Recent data needs full resolution (1-second samples), but old data can be aggregated (1-hour averages). This reduces storage by 3600x for historical data while preserving trends.

Time-Based Partitioning

Data is partitioned into chunks by time (daily/hourly). Queries for recent data touch only recent partitions. Old partitions can be archived or deleted based on retention policy.

Query Patterns

90%+ of queries are for recent data (last hour/day) and time ranges, not point lookups. This is opposite of transactional databases, requiring completely different indexes and storage layout.

Pattern Details

Consider storing application metrics in PostgreSQL:

Scenario: 1000 servers × 100 metrics per server × 1 sample/second = 100K writes/second

Problems with row-oriented databases:

sql

Why this fails:

  1. Write Amplification: Each insert creates new B-tree node, causes index updates, triggers WAL writes. 100K inserts/sec overwhelms database.
  1. Poor Compression: Row-oriented storage intermixes different metrics. Cannot exploit temporal correlation (CPU 45%, 46%, 47% stored as separate 8-byte doubles).
  1. Inefficient Queries: Range query for 1 hour = 3600 rows. But database scans entire table or uses index that points to scattered disk locations.
  1. No Retention: Data accumulates forever. After 1 year = 3.15 trillion rows. Database slows to crawl.
  1. Wasted Storage: Every row stores server_id and metric_name string (50 bytes) repeatedly. For 8-byte value, metadata is 6x larger than data!

The Core Insight

Time-series data has unique properties that general databases don't exploit: arrives in time order, immutable, high write rate, queries are range scans, old data less valuable. Specialized storage can achieve 10-100x better compression and 100x faster queries.

Trade-offs

AspectAdvantageDisadvantage

Premium Content

Sign in to access this content or upgrade for full access.