SystemExpertsSystemExperts
Pricing

Whitepapers

15 items

MapReduce: Simplified Data Processing on Large Clusters

30mintermediate

Kafka: A Distributed Messaging System for Log Processing

30mintermediate

Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services

25mintermediate

Bitcoin: A Peer-to-Peer Electronic Cash System

30mintermediate

In Search of an Understandable Consensus Algorithm (Raft)

35madvanced

TAO: Facebook's Distributed Data Store for the Social Graph

35madvanced

The Google File System

35madvanced

The Log-Structured Merge-Tree (LSM-Tree)

35madvanced

The Chubby Lock Service for Loosely-Coupled Distributed Systems

30madvanced

Spanner: Google's Globally Distributed Database

40madvanced

Bigtable: A Distributed Storage System for Structured Data

35madvanced

Scaling Memcache at Facebook

35madvanced

Large-scale cluster management at Google with Borg

35madvanced

The Next 700 Programming Languages

30madvanced

The Part-Time Parliament

40madvanced
distributed-systemsmessagingkafkastreamingpub-sublinkedindata-pipelineintermediate

Kafka: A Distributed Messaging System for Log Processing

The pub-sub system that became the backbone of modern data pipelines, processing trillions of messages daily

Jay Kreps, Neha Narkhede, Jun Rao|LinkedIn|2011|30 min read
View Original Paper

Summary

Kafka is a distributed publish-subscribe messaging system designed for high-throughput log processing. Unlike traditional message brokers that track per-message state, Kafka uses an append-only log abstraction where consumers track their own position. This simple design, combined with sequential disk I/O, OS page cache utilization, and zero-copy transfers, enables Kafka to achieve throughput orders of magnitude higher than traditional systems while providing durability guarantees through replication.

Key Takeaways

Log as the Core Abstraction

Kafka models each topic partition as an append-only, ordered, immutable sequence of messages. This log abstraction simplifies the broker (just append and serve), enables efficient sequential I/O, and allows consumers to rewind and replay messages—impossible in traditional queues.

Consumer-Managed Offsets

Unlike traditional brokers that track delivery state per message, Kafka consumers track their own position (offset) in the log. This eliminates per-message bookkeeping on the broker, enabling massive throughput and allowing consumers to re-read messages at will.

Sequential I/O Over Random Access

Kafka deliberately uses sequential disk writes and reads. Sequential disk I/O (600 MB/s) is faster than random memory access in some cases. Combined with OS page cache, Kafka achieves in-memory speeds with disk durability.

By 2010, LinkedIn faced a data infrastructure crisis. The company generated massive amounts of log data:

  • Activity events: Page views, searches, ad impressions (billions/day)
  • Operational metrics: Server CPU, memory, network stats
  • Application logs: Errors, warnings, debugging information
  • Data pipeline feeds: Database changes, derived data streams

This data needed to flow to multiple destinations: - Hadoop for offline analytics and machine learning - Real-time systems for monitoring and alerting - Search indexes for log exploration - Data warehouses for business intelligence

The existing solutions failed:

LinkedIn's Data Integration Problem

Traditional message queues (ActiveMQ, RabbitMQ) couldn't handle the throughput. They were designed for transactional messaging with per-message acknowledgment tracking—great for order processing, terrible for log streams.

Custom log aggregators (Scribe, Flume) provided throughput but no durability, replay, or real-time consumption. Data went one-way into Hadoop.

LinkedIn needed a system that combined: - High throughput of log aggregators - Durability and replay of databases - Pub-sub flexibility of message queues - Horizontal scalability of distributed systems

Summary

Kafka is a distributed publish-subscribe messaging system designed for high-throughput log processing. Unlike traditional message brokers that track per-message state, Kafka uses an append-only log abstraction where consumers track their own position. This simple design, combined with sequential disk I/O, OS page cache utilization, and zero-copy transfers, enables Kafka to achieve throughput orders of magnitude higher than traditional systems while providing durability guarantees through replication.

Key Takeaways

Log as the Core Abstraction

Kafka models each topic partition as an append-only, ordered, immutable sequence of messages. This log abstraction simplifies the broker (just append and serve), enables efficient sequential I/O, and allows consumers to rewind and replay messages—impossible in traditional queues.

Consumer-Managed Offsets

Unlike traditional brokers that track delivery state per message, Kafka consumers track their own position (offset) in the log. This eliminates per-message bookkeeping on the broker, enabling massive throughput and allowing consumers to re-read messages at will.

Sequential I/O Over Random Access

Kafka deliberately uses sequential disk writes and reads. Sequential disk I/O (600 MB/s) is faster than random memory access in some cases. Combined with OS page cache, Kafka achieves in-memory speeds with disk durability.

Zero-Copy Data Transfer

Kafka uses sendfile() to transfer data directly from page cache to network socket, bypassing user-space entirely. This eliminates two memory copies and two context switches per message batch, dramatically improving throughput.

Partitioning for Parallelism

Topics are split into partitions distributed across brokers. Each partition is an independent log with its own leader. This enables linear horizontal scaling—add partitions and brokers to increase throughput.

Consumer Groups for Scalability

Consumers in a group divide partitions among themselves—each partition is consumed by exactly one consumer in the group. This provides both pub-sub (multiple groups) and queue (single group) semantics in one system.

Deep Dive

By 2010, LinkedIn faced a data infrastructure crisis. The company generated massive amounts of log data:

  • Activity events: Page views, searches, ad impressions (billions/day)
  • Operational metrics: Server CPU, memory, network stats
  • Application logs: Errors, warnings, debugging information
  • Data pipeline feeds: Database changes, derived data streams

This data needed to flow to multiple destinations: - Hadoop for offline analytics and machine learning - Real-time systems for monitoring and alerting - Search indexes for log exploration - Data warehouses for business intelligence

The existing solutions failed:

LinkedIn's Data Integration Problem

Traditional message queues (ActiveMQ, RabbitMQ) couldn't handle the throughput. They were designed for transactional messaging with per-message acknowledgment tracking—great for order processing, terrible for log streams.

Custom log aggregators (Scribe, Flume) provided throughput but no durability, replay, or real-time consumption. Data went one-way into Hadoop.

LinkedIn needed a system that combined: - High throughput of log aggregators - Durability and replay of databases - Pub-sub flexibility of message queues - Horizontal scalability of distributed systems

Trade-offs

AspectAdvantageDisadvantage
Log-Based vs Queue-BasedAppend-only log enables replay, multiple consumers, and efficient sequential I/ONo per-message acknowledgment; consumers must manage their own offsets
Consumer-Side Offset TrackingEliminates broker-side bookkeeping; enables consumer replay and rewindConsumers responsible for offset management; exactly-once requires careful implementation
Partitioning for ParallelismLinear horizontal scaling; independent I/O per partitionNo global ordering across partitions; partition count is hard to change
High Throughput DesignOrders of magnitude faster than traditional brokers; handles millions of messages/secNot optimized for low-latency single-message delivery; batching adds latency
Retention-Based DeletionMessages persist for configured time; multiple consumers read same data; natural audit logStorage costs can grow large; must configure retention carefully

Premium Content

Sign in to access this content or upgrade for full access.