SystemExpertsSystemExperts
Pricing

Patterns

35 items

Horizontal Scaling Pattern

15mbeginner

Retry with Backoff Pattern

15mbeginner

Queue-based Load Leveling Pattern

20mintermediate

Replication Pattern

25mintermediate

Caching Strategies Pattern

25mintermediate

Fan-out Pattern

20mintermediate

Fan-in Pattern

20mintermediate

Persistent Connections Pattern

20mintermediate

Load Balancing Pattern

20mintermediate

Circuit Breaker Pattern

20mintermediate

Bloom Filters Pattern

20mintermediate

Time-Series Storage Pattern

20mintermediate

Bulkhead Pattern

20mintermediate

Batch Processing Pattern

20mintermediate

Write-Ahead Log Pattern

20mintermediate

API Gateway Pattern

20mintermediate

Backend for Frontend Pattern

20mintermediate

Sidecar Pattern

20mintermediate

Idempotency Pattern

20mintermediate

Rate Limiting Pattern

20mintermediate

Backpressure Pattern

20mintermediate

Pub/Sub Pattern

25mintermediate

Eventual Consistency Pattern

25mintermediate

Sharding Pattern

25madvanced

Conflict Resolution Pattern

25madvanced

Strong Consistency Pattern

30madvanced

Leader Election Pattern

25madvanced

Consensus Protocols Pattern

30madvanced

Stream Processing Pattern

25madvanced

Change Data Capture Pattern

25madvanced

Distributed Locking Pattern

25madvanced

Two-Phase Commit Pattern

25madvanced

LSM Trees Pattern

25madvanced

Event Sourcing Pattern

30madvanced

CQRS Pattern

28madvanced
System Design Pattern
Data Processingcdcchange-data-capturedebeziumlog-basedevent-streamingadvanced

Change Data Capture Pattern

Capturing database changes as events

Used in: Debezium, Database Replication, Event Streaming|25 min read

Summary

Change Data Capture (CDC) tracks and captures changes made to database records, publishing them as events for downstream consumption. Instead of polling databases or dual-writing, CDC reads the database's transaction log to capture every INSERT, UPDATE, and DELETE. This enables real-time data synchronization, event-driven architectures, cache invalidation, and building materialized views. Tools like Debezium, AWS DMS, and database-native features make CDC accessible. CDC is essential for keeping data consistent across systems without tight coupling.

Key Takeaways

Log-Based CDC is Authoritative

Reading transaction logs (WAL, binlog) captures all changes exactly as they happened. No missed updates. No race conditions. Single source of truth.

Eliminates Dual-Write Problem

Writing to database AND cache/queue creates inconsistency on failures. CDC writes to database only; changes flow automatically to other systems.

Enables Event-Driven Architecture

Database changes become events. Other services react to changes without polling. Loose coupling between services.

Change Data Capture Flow

Why not dual-write?

python
# DANGEROUS: Dual-write pattern def update_user(user_id, data): database.update(user_id, data) # Step 1 cache.invalidate(user_id) # Step 2 - fails? kafka.publish(user_changed) # Step 3 - fails? # If step 2 or 3 fails, systems are inconsistent!

CDC solves this: Write to database only. CDC captures change and publishes to all consumers.

Summary

Change Data Capture (CDC) tracks and captures changes made to database records, publishing them as events for downstream consumption. Instead of polling databases or dual-writing, CDC reads the database's transaction log to capture every INSERT, UPDATE, and DELETE. This enables real-time data synchronization, event-driven architectures, cache invalidation, and building materialized views. Tools like Debezium, AWS DMS, and database-native features make CDC accessible. CDC is essential for keeping data consistent across systems without tight coupling.

Key Takeaways

Log-Based CDC is Authoritative

Reading transaction logs (WAL, binlog) captures all changes exactly as they happened. No missed updates. No race conditions. Single source of truth.

Eliminates Dual-Write Problem

Writing to database AND cache/queue creates inconsistency on failures. CDC writes to database only; changes flow automatically to other systems.

Enables Event-Driven Architecture

Database changes become events. Other services react to changes without polling. Loose coupling between services.

Real-time Data Synchronization

Keep data warehouse, search index, cache in sync with source database. Changes propagate in seconds, not hours.

Preserves Change History

CDC captures before and after values, timestamps, and operation type. Enables audit trails, temporal queries, and debugging.

Handles Deletes Properly

Polling misses deletes (deleted rows disappear). CDC captures DELETE events explicitly. Complete change stream.

Pattern Details

Change Data Capture Flow

Why not dual-write?

python
# DANGEROUS: Dual-write pattern def update_user(user_id, data): database.update(user_id, data) # Step 1 cache.invalidate(user_id) # Step 2 - fails? kafka.publish(user_changed) # Step 3 - fails? # If step 2 or 3 fails, systems are inconsistent!

CDC solves this: Write to database only. CDC captures change and publishes to all consumers.

Trade-offs

AspectAdvantageDisadvantage

Premium Content

Sign in to access this content or upgrade for full access.