Design Walkthrough
Problem Statement
The Question: Design a system that shows real-time emoji reactions floating across a live video stream, like Facebook Live or Instagram Live.
The system must handle: - Reaction ingestion: Users tap to send reactions (like, love, wow, haha, sad, angry) - Real-time display: Reactions float up the screen as animations - Aggregated counts: Show total counts per reaction type - Massive scale: Viral streams with millions of concurrent viewers
What to say first
Before I design, I want to clarify the scale and latency requirements. The interesting part of this problem is handling viral moments where reaction rate spikes 100x normal.
What interviewers are testing: - Can you handle extreme write throughput? - Do you understand fan-out challenges at scale? - Can you make pragmatic tradeoffs (sampling, batching)? - Do you think about client-side constraints?
Clarifying Questions
These questions reveal senior thinking and shape the architecture.
Question 1: Scale
What is the peak concurrent viewers for a single stream? What is peak reactions per second?
Why this matters: Determines if we need sampling/batching Typical answer: 10M concurrent viewers, 1M reactions/sec peak Architecture impact: Cannot send every reaction - must aggregate
Question 2: Accuracy vs Latency
Do we need to show every single reaction, or is a representative sample acceptable?
Why this matters: Perfect accuracy is impossible at scale Typical answer: Sample is fine for animations, counts should be accurate Architecture impact: Can sample for display, batch for counts
Question 3: Latency Requirements
What is acceptable end-to-end latency from reaction to display on other viewers screens?
Why this matters: Tighter latency = more infrastructure cost Typical answer: Under 2 seconds feels real-time Architecture impact: Can batch reactions in 100-500ms windows
Question 4: Durability
Do reactions need to persist after the stream ends? Can users see reactions when watching replay?
Why this matters: Determines storage requirements Typical answer: Yes, replay should show reactions at correct timestamps Architecture impact: Need persistent storage, not just in-memory
Stating assumptions
I will assume: 10M peak viewers, 1M reactions/sec, sampling OK for animations, 2s latency budget, reactions persisted for replay. We optimize for the viral case.
The Hard Part
Say this out loud
The hard part here is the fan-out math. If 1M users send reactions per second and we have 10M viewers, sending each reaction to each viewer is 10 trillion messages per second. That is impossible.
The fan-out problem visualized:
Reactions In: 1,000,000/sec
Viewers: 10,000,000
Naive fan-out: 1M x 10M = 10 trillion messages/sec
This is not just expensive - it is physically impossible.Why this is genuinely hard:
- 1.Write amplification: Each reaction write triggers millions of reads 2. Hot partition: One viral stream = all traffic to one partition 3. Client limits: Mobile devices cannot render 1000 animations/sec 4. Network limits: Cannot push 1M updates/sec to each client
Common mistake
Candidates often start with per-reaction fan-out and do not calculate the math. Always do back-of-envelope first to see if the approach is feasible.
The solution insight:
We do NOT fan-out every reaction. Instead: 1. Aggregate reactions into time windows (100ms-500ms) 2. Sample reactions for animation display (show 10-50 per window) 3. Batch count updates (send delta every 500ms) 4. Hierarchical fan-out (servers to regional edge to clients)
Scale and Access Patterns
Let me estimate the scale and understand access patterns.
| Dimension | Value | Impact |
|---|---|---|
| Peak concurrent streams | 100,000 | Can partition by stream ID |
| Peak viewers per stream | 10,000,000 | Need hierarchical fan-out |
What to say
The key insight is that clients can only render about 30 animations per second. Even if we receive 1M reactions/sec, we only need to show a sample. This changes the architecture completely.
Write path:
- 1M reactions/sec at peak
- 50 bytes each = 50 MB/secAccess Patterns:
- Writes: Bursty, correlated with stream events (goal scored, joke landed) - Reads: Every viewer is a reader, but they receive batched samples - Hot partitions: Viral streams concentrate all traffic - Temporal locality: Only current window matters for display
High-Level Architecture
Let me start with the high-level design and then dive into each component.
What to say
The architecture has three main paths: ingestion for high-throughput writes, aggregation for batching and sampling, and distribution for hierarchical fan-out to clients.
Live Reactions Architecture
Component Responsibilities:
1. Ingestion Layer - Receives reaction HTTP/WebSocket requests - Validates and rate-limits per user - Publishes to message queue - Stateless, horizontally scalable
2. Message Queue (Kafka/Kinesis) - Buffers reactions during spikes - Partitioned by stream_id - Provides durability and replay
3. Aggregation Layer - Consumes reactions from queue - Windows reactions into time buckets (100-500ms) - Samples reactions for animation display - Updates counts in Redis - Publishes batched updates to Pub/Sub
4. Distribution Layer (Pub/Sub + Edge) - Pub/Sub fans out to edge servers - Edge servers maintain WebSocket connections to viewers - Hierarchical: Central -> Regional Edge -> Clients
Data Model and Storage
We need three types of storage for different access patterns.
What to say
Redis for real-time counts, Kafka for buffering and replay, and a database for persistent storage needed for replay functionality.
# Per-stream reaction counts (hash)
Key: reactions:{stream_id}
Value: {-- Aggregate counts per stream (updated periodically)
CREATE TABLE stream_reaction_counts (
stream_id BIGINT PRIMARY KEY,Kafka Topic Structure:
{
"stream_id": "stream_123",
"user_id": "user_456",
"reaction_type": "like",
"timestamp": 1703001600123,
"client_info": {
"device": "ios",
"version": "2.1.0"
}
}Key Design Decisions:
- 1.Kafka partitioned by stream_id: All reactions for a stream go to same partition, enabling stateful aggregation
- 2.Redis for hot data: Sub-millisecond reads for current counts and recent windows
- 3.PostgreSQL for replay: Only need to query when user watches replay, not real-time critical
Algorithm Deep Dive: Aggregation
The aggregator is the heart of the system. It transforms high-volume individual reactions into batched, sampled updates.
Aggregation Pipeline
import asyncio
from collections import defaultdict
import randomReservoir Sampling Explained:
We cannot store all 100K reactions in a window. Reservoir sampling gives us a uniform random sample of size K from a stream of unknown length.
Algorithm: 1. First K items go directly into sample 2. For item i (where i > K), include with probability K/i 3. If included, replace random existing item
Result: Every reaction has equal probability of being in final sample
Why reservoir sampling?
We do not know how many reactions will arrive in a window. Reservoir sampling gives us a fair sample without storing everything. Each reaction has equal probability of being displayed.
Distribution: Fan-out Strategy
Distributing updates to millions of viewers requires hierarchical fan-out.
What to say
We use a hierarchical fan-out: aggregator publishes once to Pub/Sub, edge servers in each region subscribe, and edge servers push to their connected clients. This reduces fan-out from 1-to-10M to 1-to-1000-to-10K.
Hierarchical Fan-out
Fan-out math with hierarchy:
Without hierarchy:
1 aggregator -> 10M clients = 10M connections
With hierarchy (3 levels):
1 aggregator -> 10 regional Pub/Sub
10 regional -> 1000 edge servers
1000 edge servers -> 10K clients each = 10M clients
Total connections from aggregator: 10 (manageable)class EdgeServer:
def __init__(self, region: str):
self.region = regionImportant optimization
Edge servers should batch WebSocket writes. Instead of 20 sends for 20 animations, serialize once and send the batch. This reduces CPU dramatically.
Consistency and Invariants
System Invariants
1. Reaction counts must be monotonically increasing to viewers - never show count going down. 2. Displayed reactions must be real (from actual users), not fabricated. 3. Each user reaction counted at most once.
Consistency Model:
This system uses eventual consistency with specific guarantees:
| Aspect | Consistency Level | Why |
|---|---|---|
| Total counts | Eventual (1-2 sec lag) | Users tolerate slight delay in count updates |
| Count direction | Monotonic | Counts must never decrease - would confuse users |
| Animation display | Best effort | Missing some reactions is fine, showing fake is not |
| Cross-device | Eventual | Same user on phone and laptop may see slightly different counts |
Ensuring monotonic counts:
class ReactionCountDisplay:
def __init__(self):
self.displayed_counts = {} # {reaction_type: count}Why counts might arrive out of order
Network delays, edge server processing time, and batching can cause updates to arrive out of order. Client-side enforcement of monotonicity is simpler than distributed ordering.
Preventing duplicate counting:
- 1.Idempotency key: Each reaction has unique ID (user_id + stream_id + timestamp) 2. Deduplication window: Keep set of recent reaction IDs in aggregator 3. Rate limiting: Users can only send N reactions per minute per stream
Failure Modes and Resilience
Proactively discuss failures
Let me walk through what happens when components fail. This is critical for a real-time system.
| Failure | Impact | Mitigation | Recovery |
|---|---|---|---|
| Aggregator crash | Reactions not processed | Multiple aggregators per stream partition | Kafka replays unprocessed messages |
| Redis down | Cannot read current counts | Fallback to cached counts on edge | Redis recovers, aggregator resends |
Handling viral spikes:
class AdaptiveAggregator:
def __init__(self):
self.base_sample_size = 20Graceful degradation priority
When overloaded: 1) Keep counts accurate (aggregate all), 2) Reduce animation sample size, 3) Increase batch window. Counts matter more than animations.
Evolution and Scaling
What to say
This design handles 1M reactions/sec and 10M viewers per stream. Let me discuss how it evolves for 10x scale and additional features.
Evolution Path:
Stage 1: Single region (current design) - Works for up to 10M viewers - Single Kafka cluster, multiple aggregators - Edge servers in one region
Stage 2: Multi-region - Regional Kafka clusters with cross-region replication - Regional aggregators (closer to users) - Global Pub/Sub for distribution
Stage 3: Extreme scale (100M+ viewers) - Client-side aggregation before sending - Probabilistic counting (HyperLogLog for unique reactors) - Multiple aggregation tiers
Multi-Region Architecture
Additional features to discuss:
- 1.Reaction heatmap: Show which parts of video got most reactions - Store reactions with video timestamp - Aggregate into time buckets - Display as graph overlay
- 2.Top reactors leaderboard: Show users who reacted most - Use Count-Min Sketch for approximate counting - Keep top-K in memory
- 3.Sentiment analysis: Real-time mood of audience - Weighted sum of reaction types - Sliding window average
Alternative approach
If latency requirements were tighter (sub-100ms), I would use WebSocket directly from client to aggregator with client-side rate limiting and local batching. Trade simplicity for latency.
What I would change for different requirements:
Exact counts required: Use distributed transactions with Kafka exactly-once semantics. Higher latency but accurate.
Replay not needed: Skip persistent storage entirely. Pure in-memory pipeline for lowest latency.
Celebrity streams only: Pre-provision dedicated infrastructure for known large streams. Dynamic scaling for organic viral moments.