System Design Masterclass

58 items

System Design Masterclass

Social Mediareal-timestreamingwebsocketsaggregationfan-outintermediate

Design Live Reactions System

Design Facebook-style live likes and reactions for streaming content

Millions of reactions/sec|Similar to Facebook, Instagram, TikTok, YouTube, Twitch|45 min read

Summary

A live reactions system displays real-time emoji reactions (likes, hearts, laughs) floating across a live video stream. The core challenges are handling millions of reactions per second during viral moments, aggregating counts efficiently, broadcasting to millions of concurrent viewers with minimal latency, and preventing the system from overwhelming clients. This is asked at Facebook, TikTok, YouTube, and any live streaming company.

Key Takeaways

Core Problem

This is a write-heavy fan-out problem. Millions of users write reactions, and those reactions must be broadcast to millions of viewers in real-time.

The Hard Part

A viral live stream can receive 100K reactions per second. You cannot send each reaction individually to 10M viewers - that is 1 trillion messages per second.

Scaling Axis

Scale by stream ID. Each stream is independent. Within a stream, aggregate reactions into time windows before broadcasting.

The Question: Design a system that shows real-time emoji reactions floating across a live video stream, like Facebook Live or Instagram Live.

The system must handle: - Reaction ingestion: Users tap to send reactions (like, love, wow, haha, sad, angry) - Real-time display: Reactions float up the screen as animations - Aggregated counts: Show total counts per reaction type - Massive scale: Viral streams with millions of concurrent viewers

What to say first

Before I design, I want to clarify the scale and latency requirements. The interesting part of this problem is handling viral moments where reaction rate spikes 100x normal.

What interviewers are testing: - Can you handle extreme write throughput? - Do you understand fan-out challenges at scale? - Can you make pragmatic tradeoffs (sampling, batching)? - Do you think about client-side constraints?

Summary

Key Takeaways

Core Problem

This is a write-heavy fan-out problem. Millions of users write reactions, and those reactions must be broadcast to millions of viewers in real-time.

The Hard Part

A viral live stream can receive 100K reactions per second. You cannot send each reaction individually to 10M viewers - that is 1 trillion messages per second.

Scaling Axis

Scale by stream ID. Each stream is independent. Within a stream, aggregate reactions into time windows before broadcasting.

Critical Invariant

Reactions must feel real-time (under 2 second latency) and counts must be monotonically increasing to viewers. Never show count going backwards.

Performance Requirement

Handle 1M+ reactions/sec write throughput. Broadcast to 10M+ concurrent viewers. End-to-end latency under 2 seconds.

Key Tradeoff

We batch and sample reactions for display. Users see a representative sample, not every single reaction. Counts are eventually consistent.

Design Walkthrough

Problem Statement

The Question: Design a system that shows real-time emoji reactions floating across a live video stream, like Facebook Live or Instagram Live.

What to say first

Before I design, I want to clarify the scale and latency requirements. The interesting part of this problem is handling viral moments where reaction rate spikes 100x normal.

Clarifying Questions

These questions reveal senior thinking and shape the architecture.

Question 1: Scale

What is the peak concurrent viewers for a single stream? What is peak reactions per second?

Why this matters: Determines if we need sampling/batching Typical answer: 10M concurrent viewers, 1M reactions/sec peak Architecture impact: Cannot send every reaction - must aggregate

Question 2: Accuracy vs Latency

Do we need to show every single reaction, or is a representative sample acceptable?

Why this matters: Perfect accuracy is impossible at scale Typical answer: Sample is fine for animations, counts should be accurate Architecture impact: Can sample for display, batch for counts

Question 3: Latency Requirements

What is acceptable end-to-end latency from reaction to display on other viewers screens?

Why this matters: Tighter latency = more infrastructure cost Typical answer: Under 2 seconds feels real-time Architecture impact: Can batch reactions in 100-500ms windows

Question 4: Durability

Do reactions need to persist after the stream ends? Can users see reactions when watching replay?

Why this matters: Determines storage requirements Typical answer: Yes, replay should show reactions at correct timestamps Architecture impact: Need persistent storage, not just in-memory

Stating assumptions

I will assume: 10M peak viewers, 1M reactions/sec, sampling OK for animations, 2s latency budget, reactions persisted for replay. We optimize for the viral case.

The Hard Part

Say this out loud

The hard part here is the fan-out math. If 1M users send reactions per second and we have 10M viewers, sending each reaction to each viewer is 10 trillion messages per second. That is impossible.

The fan-out problem visualized:

Reactions In: 1,000,000/sec
Viewers: 10,000,000
Naive fan-out: 1M x 10M = 10 trillion messages/sec

This is not just expensive - it is physically impossible.

Why this is genuinely hard:

1.Write amplification: Each reaction write triggers millions of reads 2. Hot partition: One viral stream = all traffic to one partition 3. Client limits: Mobile devices cannot render 1000 animations/sec 4. Network limits: Cannot push 1M updates/sec to each client

Common mistake

Candidates often start with per-reaction fan-out and do not calculate the math. Always do back-of-envelope first to see if the approach is feasible.

The solution insight:

We do NOT fan-out every reaction. Instead: 1. Aggregate reactions into time windows (100ms-500ms) 2. Sample reactions for animation display (show 10-50 per window) 3. Batch count updates (send delta every 500ms) 4. Hierarchical fan-out (servers to regional edge to clients)

Scale and Access Patterns

Let me estimate the scale and understand access patterns.

Dimension	Value	Impact
Peak concurrent streams	100,000	Can partition by stream ID
Peak viewers per stream	10,000,000	Need hierarchical fan-out

+ 6 more rows...

What to say

The key insight is that clients can only render about 30 animations per second. Even if we receive 1M reactions/sec, we only need to show a sample. This changes the architecture completely.

Write path:
- 1M reactions/sec at peak
- 50 bytes each = 50 MB/sec

+ 10 more lines...

Access Patterns:

Writes: Bursty, correlated with stream events (goal scored, joke landed) - Reads: Every viewer is a reader, but they receive batched samples - Hot partitions: Viral streams concentrate all traffic - Temporal locality: Only current window matters for display

High-Level Architecture

Let me start with the high-level design and then dive into each component.

What to say

The architecture has three main paths: ingestion for high-throughput writes, aggregation for batching and sampling, and distribution for hierarchical fan-out to clients.

Live Reactions Architecture

Component Responsibilities:

1. Ingestion Layer - Receives reaction HTTP/WebSocket requests - Validates and rate-limits per user - Publishes to message queue - Stateless, horizontally scalable

2. Message Queue (Kafka/Kinesis) - Buffers reactions during spikes - Partitioned by stream_id - Provides durability and replay

3. Aggregation Layer - Consumes reactions from queue - Windows reactions into time buckets (100-500ms) - Samples reactions for animation display - Updates counts in Redis - Publishes batched updates to Pub/Sub

4. Distribution Layer (Pub/Sub + Edge) - Pub/Sub fans out to edge servers - Edge servers maintain WebSocket connections to viewers - Hierarchical: Central -> Regional Edge -> Clients

Data Model and Storage

We need three types of storage for different access patterns.

What to say

Redis for real-time counts, Kafka for buffering and replay, and a database for persistent storage needed for replay functionality.

# Per-stream reaction counts (hash)
Key: reactions:{stream_id}
Value: {

+ 14 more lines...

-- Aggregate counts per stream (updated periodically)
CREATE TABLE stream_reaction_counts (
    stream_id BIGINT PRIMARY KEY,

+ 21 more lines...

Kafka Topic Structure:

{
  "stream_id": "stream_123",
  "user_id": "user_456",
  "reaction_type": "like",
  "timestamp": 1703001600123,
  "client_info": {
    "device": "ios",
    "version": "2.1.0"
  }
}

Key Design Decisions:

1.Kafka partitioned by stream_id: All reactions for a stream go to same partition, enabling stateful aggregation
2.Redis for hot data: Sub-millisecond reads for current counts and recent windows
3.PostgreSQL for replay: Only need to query when user watches replay, not real-time critical

Algorithm Deep Dive: Aggregation

The aggregator is the heart of the system. It transforms high-volume individual reactions into batched, sampled updates.

Aggregation Pipeline

import asyncio
from collections import defaultdict
import random

+ 45 more lines...

Reservoir Sampling Explained:

We cannot store all 100K reactions in a window. Reservoir sampling gives us a uniform random sample of size K from a stream of unknown length.

Algorithm: 1. First K items go directly into sample 2. For item i (where i > K), include with probability K/i 3. If included, replace random existing item

Result: Every reaction has equal probability of being in final sample

Why reservoir sampling?

We do not know how many reactions will arrive in a window. Reservoir sampling gives us a fair sample without storing everything. Each reaction has equal probability of being displayed.

Distribution: Fan-out Strategy

Distributing updates to millions of viewers requires hierarchical fan-out.

What to say

We use a hierarchical fan-out: aggregator publishes once to Pub/Sub, edge servers in each region subscribe, and edge servers push to their connected clients. This reduces fan-out from 1-to-10M to 1-to-1000-to-10K.

Hierarchical Fan-out

Fan-out math with hierarchy:

Without hierarchy:
  1 aggregator -> 10M clients = 10M connections

With hierarchy (3 levels):
  1 aggregator -> 10 regional Pub/Sub
  10 regional -> 1000 edge servers
  1000 edge servers -> 10K clients each = 10M clients

Total connections from aggregator: 10 (manageable)

class EdgeServer:
    def __init__(self, region: str):
        self.region = region

+ 28 more lines...

Important optimization

Edge servers should batch WebSocket writes. Instead of 20 sends for 20 animations, serialize once and send the batch. This reduces CPU dramatically.

Consistency and Invariants

System Invariants

1. Reaction counts must be monotonically increasing to viewers - never show count going down. 2. Displayed reactions must be real (from actual users), not fabricated. 3. Each user reaction counted at most once.

Consistency Model:

This system uses eventual consistency with specific guarantees:

Aspect	Consistency Level	Why
Total counts	Eventual (1-2 sec lag)	Users tolerate slight delay in count updates
Count direction	Monotonic	Counts must never decrease - would confuse users
Animation display	Best effort	Missing some reactions is fine, showing fake is not
Cross-device	Eventual	Same user on phone and laptop may see slightly different counts

Ensuring monotonic counts:

class ReactionCountDisplay:
    def __init__(self):
        self.displayed_counts = {}  # {reaction_type: count}

+ 12 more lines...

Why counts might arrive out of order

Network delays, edge server processing time, and batching can cause updates to arrive out of order. Client-side enforcement of monotonicity is simpler than distributed ordering.

Preventing duplicate counting:

1.Idempotency key: Each reaction has unique ID (user_id + stream_id + timestamp) 2. Deduplication window: Keep set of recent reaction IDs in aggregator 3. Rate limiting: Users can only send N reactions per minute per stream

Failure Modes and Resilience

Proactively discuss failures

Let me walk through what happens when components fail. This is critical for a real-time system.

Failure	Impact	Mitigation	Recovery
Aggregator crash	Reactions not processed	Multiple aggregators per stream partition	Kafka replays unprocessed messages
Redis down	Cannot read current counts	Fallback to cached counts on edge	Redis recovers, aggregator resends

+ 4 more rows...

Handling viral spikes:

class AdaptiveAggregator:
    def __init__(self):
        self.base_sample_size = 20

+ 15 more lines...

Graceful degradation priority

When overloaded: 1) Keep counts accurate (aggregate all), 2) Reduce animation sample size, 3) Increase batch window. Counts matter more than animations.

Evolution and Scaling

What to say

This design handles 1M reactions/sec and 10M viewers per stream. Let me discuss how it evolves for 10x scale and additional features.

Evolution Path:

Stage 1: Single region (current design) - Works for up to 10M viewers - Single Kafka cluster, multiple aggregators - Edge servers in one region

Stage 2: Multi-region - Regional Kafka clusters with cross-region replication - Regional aggregators (closer to users) - Global Pub/Sub for distribution

Stage 3: Extreme scale (100M+ viewers) - Client-side aggregation before sending - Probabilistic counting (HyperLogLog for unique reactors) - Multiple aggregation tiers

Multi-Region Architecture

Additional features to discuss:

1.Reaction heatmap: Show which parts of video got most reactions - Store reactions with video timestamp - Aggregate into time buckets - Display as graph overlay
2.Top reactors leaderboard: Show users who reacted most - Use Count-Min Sketch for approximate counting - Keep top-K in memory
3.Sentiment analysis: Real-time mood of audience - Weighted sum of reaction types - Sliding window average

Alternative approach

If latency requirements were tighter (sub-100ms), I would use WebSocket directly from client to aggregator with client-side rate limiting and local batching. Trade simplicity for latency.

What I would change for different requirements:

Exact counts required: Use distributed transactions with Kafka exactly-once semantics. Higher latency but accurate.

Replay not needed: Skip persistent storage entirely. Pure in-memory pipeline for lowest latency.

Celebrity streams only: Pre-provision dedicated infrastructure for known large streams. Dynamic scaling for organic viral moments.

Design Trade-offs

Advantages

+Every reaction shown
+Simplest logic

Disadvantages

-Impossible at scale (10T messages/sec)
-Network saturated

When to use

Never at scale - only for tiny streams

System Design Masterclass

Weather Application with Forecasting

URL Shortener

Live Comments Feature

API Rate Limiter

On-Call Escalation System

Hotel Booking and Reservation System

Parts Compatibility Validation

Real-time Stock Price Viewer

Top-K Rankings System

File Download and Sync Library

Real-time Active Viewers

Marketplace Features

Price Alert System

Netflix Screen Concurrency Limits

Live Reactions System

Top K Most Shared Articles

High-Profile Likes Counter

Authentication and User Login

Google Calendar

Web Crawler

News Feed

Video Streaming Platform

IoC / Dependency Injection Framework

Distributed Control Infrastructure

Notification Service

Distributed Tracing System

P2P File Transfer System

Large Data Migration to Cloud

Wire Transfer API

Large Data Sorting and Processing

Database Control Plane

Distributed Metrics Logging and Aggregation

Ads Management & Delivery System

Flash Sale Backend

Photo Sharing Platform

Cluster Health Monitoring System

Rider Matching System

Surge Pricing System

Collaborative Editing System

Server Metrics Collection System

User Analytics Dashboard & Event Pipeline

Dropbox / Google Drive

Distributed Message Queue

ETA and Live Location Sharing

Distributed Key-Value Store

Distributed Stream Processing System

Payment Processing System

Distributed Job Scheduler

WhatsApp / Messenger

Payment Wallet at Global Scale

Uber / Ride Sharing

Web Search Engine

Globally Distributed SQL Database

Real-Time Analytics System

Recommendation System (Netflix)

Multi-Region Disaster Recovery System

Time-Series Database

Fraud Detection System

Design Live Reactions System

Summary

Key Takeaways

Core Problem

The Hard Part

Scaling Axis

Problem Statement

What to say first

Premium Content