Patterns

35 items

System Design Pattern

Data Distributionreplicationredundancyavailabilityleader-followermulti-leaderintermediate

Replication Pattern

Data redundancy for availability and read scaling

Used in: Database Replicas, Redis Sentinel, Kafka|25 min read

Summary

Database replication copies data from a primary (leader) to one or more replicas (followers) for high availability and read scaling. When the primary fails, a replica can be promoted to take over (failover). Replicas can also serve read queries, distributing load across multiple machines. Replication modes range from synchronous (strong consistency, higher latency) to asynchronous (eventual consistency, lower latency). This pattern is fundamental to production databases - every major database supports replication, and managed services like AWS RDS and Cloud SQL use it for high availability.

Key Takeaways

High Availability Through Redundancy

If primary fails, a replica can be promoted to primary. This enables automatic failover with minimal downtime. Without replication, primary failure means complete outage until recovery.

Read Scaling with Replicas

Replicas can serve read queries, multiplying read capacity. If primary handles 10K reads/sec, 4 replicas enable 50K reads/sec total. Write capacity remains limited to primary.

Synchronous vs Asynchronous

Synchronous: Primary waits for replica acknowledgment before confirming write. Strong consistency but higher latency. Asynchronous: Primary confirms immediately, replica catches up later. Lower latency but potential data loss on failover.

Without replication:

Database server fails → complete outage
Recovery requires restore from backup (hours)
Single server limits read throughput
No disaster recovery for datacenter failure

Business Impact: - E-commerce: $100K+ revenue per hour of downtime - SaaS: SLA breaches, customer churn - Financial: Regulatory violations, trading losses

Single Database vs Replicated

Summary

Key Takeaways

High Availability Through Redundancy

If primary fails, a replica can be promoted to primary. This enables automatic failover with minimal downtime. Without replication, primary failure means complete outage until recovery.

Read Scaling with Replicas

Replicas can serve read queries, multiplying read capacity. If primary handles 10K reads/sec, 4 replicas enable 50K reads/sec total. Write capacity remains limited to primary.

Synchronous vs Asynchronous

Replication Lag

Async replicas may be behind primary by milliseconds to minutes. Read-your-writes consistency requires reading from primary after writes, or tracking replication position.

Split-Brain Prevention

During network partition, both primary and replica might think they're primary. This causes data divergence. Use quorum-based systems or fencing to prevent split-brain.

Geographic Distribution

Replicas can be in different regions for disaster recovery and lower read latency for global users. Cross-region replication has higher lag due to network latency.

Pattern Details

Without replication:

Database server fails → complete outage
Recovery requires restore from backup (hours)
Single server limits read throughput
No disaster recovery for datacenter failure

Business Impact: - E-commerce: $100K+ revenue per hour of downtime - SaaS: SLA breaches, customer churn - Financial: Regulatory violations, trading losses

Single Database vs Replicated

Trade-offs

Aspect	Advantage	Disadvantage

Patterns

Horizontal Scaling Pattern

Retry with Backoff Pattern