System Design Masterclass

58 items

System Design Masterclass

Social Mediareal-timewebsocketspub-subfan-outstreamingintermediate

Design Live Comments Feature

Design real-time live comments like Facebook Live or YouTube Live

100K+ concurrent viewers per stream|Similar to Facebook, YouTube, Twitch, TikTok, Instagram|45 min read

Summary

Live comments enable real-time interaction during live streams, showing comments to all viewers within seconds. The core challenge is delivering messages to hundreds of thousands of concurrent viewers while maintaining order, handling viral streams, and keeping latency under 2 seconds. This is asked at Facebook, YouTube, Twitch, and streaming companies.

Key Takeaways

Core Problem

This is a real-time fan-out problem. One comment must be delivered to potentially millions of viewers within seconds.

The Hard Part

A viral stream with 500K viewers means one comment triggers 500K push operations. This creates a hot partition problem.

Scaling Axis

Scale by stream ID. Each stream is independent - partition comment handlers by stream, not by user.

The Question: Design a live comments system for a live streaming platform. When someone comments on a live stream, all viewers should see it in real-time.

Core Requirements: - Real-time delivery: Comments appear within 1-2 seconds for all viewers - Ordering: Comments appear in the same order for all viewers - Scale: Support streams with 100K+ concurrent viewers - Reliability: No lost comments during normal operation

What to say first

This is a real-time fan-out problem. Let me clarify the scale, consistency requirements, and what happens during viral moments before designing.

What interviewers are testing: - Do you understand WebSocket vs polling vs SSE tradeoffs? - Can you handle the hot partition problem (viral streams)? - Do you know pub/sub patterns? - Can you reason about ordering and consistency in distributed systems?

Summary

Key Takeaways

Core Problem

This is a real-time fan-out problem. One comment must be delivered to potentially millions of viewers within seconds.

The Hard Part

A viral stream with 500K viewers means one comment triggers 500K push operations. This creates a hot partition problem.

Scaling Axis

Scale by stream ID. Each stream is independent - partition comment handlers by stream, not by user.

Critical Invariant

Comments must appear in consistent order for all viewers. Out-of-order comments break conversation flow.

Performance Requirement

End-to-end latency under 2 seconds from comment submission to display on all viewers screens.

Key Tradeoff

We trade perfect consistency for availability. Brief ordering inconsistencies are acceptable; dropped comments are not.

Design Walkthrough

Problem Statement

The Question: Design a live comments system for a live streaming platform. When someone comments on a live stream, all viewers should see it in real-time.

What to say first

This is a real-time fan-out problem. Let me clarify the scale, consistency requirements, and what happens during viral moments before designing.

Clarifying Questions

These questions shape fundamental architecture decisions.

Question 1: Scale per stream

What is the maximum number of concurrent viewers for a single stream? What about total platform concurrent viewers?

Why this matters: Determines fan-out strategy. 1K viewers vs 1M viewers requires different architectures. Typical answer: Up to 500K concurrent per viral stream, 10M total platform Architecture impact: Need hierarchical fan-out, cannot push directly to all connections

Question 2: Comment rate

How many comments per second during peak on a popular stream?

Why this matters: Determines if we need throttling/sampling. Typical answer: 1000-5000 comments/second on viral streams Architecture impact: May need to sample/aggregate comments for extremely active streams

Question 3: Ordering strictness

Must all viewers see comments in exactly the same order? What about slight delays between viewers?

Why this matters: Strong ordering requires coordination which adds latency. Typical answer: Same order for all, but 100ms difference in arrival time is OK Architecture impact: Can use single sequencer per stream without distributed consensus

Question 4: Persistence requirements

Do we need to store comments permanently? Can viewers joining late see past comments?

Why this matters: Affects storage design and replay capability. Typical answer: Store for replay, show last 100 comments to new joiners Architecture impact: Need persistent storage + recent comment cache

Stating assumptions

I will assume: 500K max viewers per stream, 1000 comments/sec peak, causal ordering (all see same order), persist comments with replay for late joiners.

The Hard Part

Say this out loud

The hard part here is fan-out to massive audiences. When a viral stream has 500K viewers, every single comment must be delivered to 500K connections within 2 seconds.

Why this is genuinely hard:

1.Hot Partition Problem: A viral stream concentrates all traffic on one logical partition. You cannot shard by stream ID when one stream is the problem.
2.Connection State: 500K persistent WebSocket connections require significant memory. Each connection is ~10KB of state = 5GB just for one stream.
3.Fan-out Amplification: 1000 comments/sec x 500K viewers = 500M messages/sec that need to be delivered.
4.Ordering at Scale: All 500K viewers must see the same order, but are connected to different servers.

Viral stream scenario:
- Viewers: 500,000
- Comments/sec: 1,000
- Messages to deliver: 500,000 x 1,000 = 500,000,000/sec

If each WebSocket server handles 50K connections:
- Servers needed: 500,000 / 50,000 = 10 servers
- Each server delivers: 50K x 1000 = 50M messages/sec

This is why direct fan-out does not scale!

Common mistake

Candidates often design for average case (1K viewers) and forget that viral streams exist. Always design for the hot partition scenario.

The solution preview: Hierarchical fan-out with edge servers. Instead of one service pushing to 500K connections, use a tree structure where each level only fans out to ~100 children.

Scale & Access Patterns

Let me estimate the scale and understand access patterns.

Dimension	Value	Impact
Max viewers per stream	500,000	Hot partition problem - cannot shard by stream
Total concurrent viewers	10,000,000	Need distributed WebSocket infrastructure

+ 4 more rows...

What to say

The access pattern is extremely write-heavy from a fan-out perspective. One write (comment) triggers 500K reads (deliveries). This is the opposite of typical read-heavy systems.

Access Pattern Analysis:

Bursty traffic: Viral moments cause 100x normal load instantly - Geographic distribution: Viewers are globally distributed - Connection lifetime: Minutes to hours (duration of stream) - Read pattern: All viewers read same comments (perfect for caching/broadcast) - Write pattern: Comments come from anywhere, need single ordering point

WebSocket server capacity:
- RAM: 64GB server, 10KB per connection = 6.4M connections max
- But CPU bound at ~50-100K active connections with message processing

+ 8 more lines...

High-Level Architecture

The key insight is hierarchical fan-out. Instead of one service pushing to all viewers, we create a tree of fan-out.

What to say

I will design a hierarchical fan-out architecture. Comments flow through: Ingestion -> Sequencer -> Pub/Sub -> Edge Servers -> Clients. Each layer only fans out to ~100 children.

Live Comments Architecture

Component Responsibilities:

1. Comment API (Ingestion) - Receives comment from client - Validates content (spam, length, auth) - Forwards to sequencer for ordering

2. Sequencer (Per-Stream) - Assigns sequence number to each comment - Single point of ordering per stream - Writes to storage and publishes to pub/sub

3. Pub/Sub Layer (Redis/Kafka) - Distributes comments to all edge servers - Each stream is a channel/topic - Edge servers subscribe to streams they have viewers for

4. Edge Servers (WebSocket) - Maintain persistent connections with viewers - Subscribe to pub/sub for relevant streams - Push comments to connected clients

5. Storage - Persistent storage for replay/history - Recent comment cache for new joiners

Real-world reference

This is similar to how Twitch and YouTube Live work. They use hierarchical edge distribution. Discord uses a similar pattern for large servers.

Data Model & Storage

We need two storage systems: persistent storage for history and a fast cache for real-time delivery.

What to say

The source of truth is the comments database, but real-time delivery uses pub/sub. New joiners get recent comments from cache, then subscribe for live updates.

CREATE TABLE comments (
    id              BIGINT PRIMARY KEY,
    stream_id       VARCHAR(64) NOT NULL,

+ 11 more lines...

Redis for Real-time:

# Pub/Sub channel per stream
CHANNEL = f"comments:{stream_id}"

+ 24 more lines...

Sequencer Implementation:

class StreamSequencer:
    def __init__(self, stream_id: str):
        self.stream_id = stream_id

+ 22 more lines...

Important detail

We publish to real-time before confirming database write. This prioritizes latency over durability. For critical messages, you might want the opposite.

Real-time Delivery Deep Dive

Let me dive deep into the real-time delivery layer - this is the most complex part.

Protocol	Pros	Cons	Best For
WebSocket	Bidirectional, low latency, efficient	Connection management complexity	Interactive real-time (our case)
Server-Sent Events	Simple, HTTP-based, auto-reconnect	Unidirectional only	Read-only streams
Long Polling	Works everywhere, simple	Higher latency, more overhead	Fallback only
HTTP/2 Push	No extra connection	Limited browser support	Not recommended

Protocol choice

WebSocket is the right choice here. We need bidirectional (viewers can also comment), low latency, and efficient use of connections for long-lived streams.

Edge Server Implementation:

class EdgeServer:
    def __init__(self):
        # stream_id -> set of websocket connections

+ 53 more lines...

Message Flow

Connection management

Each edge server tracks which streams have local subscribers. It only subscribes to pub/sub channels for streams with active viewers. This prevents unnecessary message processing.

Consistency & Invariants

System Invariants

All viewers must see comments in the same order. Out-of-order comments break conversation flow and confuse users.

How we guarantee ordering:

The key is a single sequencer per stream. All comments for a stream go through one sequencer that assigns monotonically increasing sequence numbers.

Comment A arrives at T=0    -> Sequencer assigns seq=1
Comment B arrives at T=0.1  -> Sequencer assigns seq=2
Comment C arrives at T=0.2  -> Sequencer assigns seq=3

All viewers receive: A(1), B(2), C(3) in that order

Even if network delays cause C to arrive before B at some edge:
- Edge server buffers and reorders by sequence number
- Or accepts slight out-of-order (C, B) if within tolerance

Single sequencer = single point of failure?

Yes, but it is per-stream. One stream failing does not affect others. For extreme HA, use a replicated log (Kafka) as the sequencer with exactly-once semantics.

What about different arrival times at viewers?

Two viewers might see comment #5 at different times: - Viewer A (close to edge server): sees at T+100ms - Viewer B (far from edge): sees at T+500ms

This is acceptable because: - They still see the same order - The delay is usually imperceptible - Perfect synchronization would require distributed consensus (too slow)

Consistency Level	How Achieved	Latency Cost
Same order for all	Single sequencer per stream	~10ms at sequencer
Exactly same time	Would need distributed clock sync	Not practical
No duplicates	Dedup by comment ID at edge	Minimal
No gaps	Sequence numbers, client requests missing	Adds complexity

What to say

We guarantee causal consistency - all viewers see the same order. We do not guarantee simultaneous delivery (that would require distributed consensus which is too slow for real-time chat).

Failure Modes & Resilience

Proactively discuss failures

Let me walk through what happens when components fail.

Failure	Impact	Mitigation	Recovery
Edge server crash	5K viewers disconnected	Client auto-reconnect to different edge	Reconnect + fetch missed comments by seq#
Sequencer down	New comments for that stream fail	Standby sequencer takes over	Clients retry, may see brief delay
Redis Pub/Sub down	Comments not delivered real-time	Fall back to polling recent cache	Restore pub/sub, clients catch up
Database slow	Comments still delivered (async persist)	Real-time unaffected, history delayed	Process backlog when DB recovers
Network partition	Some edges isolated	Isolated edges serve from local cache	Reconcile when partition heals

Client-side resilience:

The client is crucial for reliability:

class LiveCommentsClient {
    constructor(streamId) {
        this.streamId = streamId;

+ 42 more lines...

Gap detection

If client sees seq 5, then seq 8, it knows it missed 6 and 7. It can request these specific comments from the API. This makes the system self-healing.

Graceful degradation for viral streams:

When a stream goes viral and exceeds capacity:

async def maybe_sample_comment(stream_id: str, comment: dict) -> bool:
    """Return True if comment should be delivered"""

+ 15 more lines...

Evolution & Scaling

What to say

This design works well for streams up to 500K viewers with single-region deployment. Let me discuss evolution for 10M+ concurrent and global deployment.

Evolution Path:

Stage 1: Single Region (up to 500K per stream) - Single Redis cluster for pub/sub - Edge servers in one region - Works for most streams

Stage 2: Multi-Region (up to 2M per stream) - Redis cluster per region - Cross-region replication with Kafka - Edge servers globally distributed

Stage 3: Extreme Scale (10M+ per stream) - Custom pub/sub with hierarchical fan-out - CDN integration for edge delivery - Comment aggregation/sampling

Multi-Region Architecture

Global latency optimization:

Comments are sequenced centrally (one region) - But delivered via regional edge servers - Viewers connect to nearest edge - Cross-region latency only affects the sequencer hop

Viewer Location	Sequencer in US	Total Latency
US West	~20ms to sequencer	~100ms end-to-end
US East	~40ms to sequencer	~150ms end-to-end
EU	~100ms to sequencer	~300ms end-to-end
Asia	~200ms to sequencer	~500ms end-to-end

Alternative approach

For truly global streams, you could use regional sequencers with vector clocks for ordering. This is more complex but reduces cross-region latency. Only worth it for the top 0.1% of streams.

What I would do differently for...

Gaming/Esports (ultra-low latency): Use UDP-based protocol, accept some packet loss, optimize for <100ms

Moderated comments (news): Add moderation queue before sequencer, accept higher latency for quality

Reactions only (no text): Use aggregated counters instead of individual messages, much simpler

Design Trade-offs

Advantages

+Simple architecture
+Lowest latency for small streams

Disadvantages

-Does not scale past 10K viewers
-Hot partition problem

When to use

Small streams, MVP, internal tools

System Design Masterclass

Weather Application with Forecasting

URL Shortener

Live Comments Feature

API Rate Limiter

On-Call Escalation System

Hotel Booking and Reservation System

Parts Compatibility Validation

Real-time Stock Price Viewer

Top-K Rankings System

File Download and Sync Library

Real-time Active Viewers

Marketplace Features

Price Alert System

Netflix Screen Concurrency Limits

Live Reactions System

Top K Most Shared Articles

High-Profile Likes Counter

Authentication and User Login

Google Calendar

Web Crawler

News Feed

Video Streaming Platform

IoC / Dependency Injection Framework

Distributed Control Infrastructure

Notification Service

Distributed Tracing System

P2P File Transfer System

Large Data Migration to Cloud

Wire Transfer API

Large Data Sorting and Processing

Database Control Plane

Distributed Metrics Logging and Aggregation

Ads Management & Delivery System

Flash Sale Backend

Photo Sharing Platform

Cluster Health Monitoring System

Rider Matching System

Surge Pricing System

Collaborative Editing System

Server Metrics Collection System

User Analytics Dashboard & Event Pipeline

Dropbox / Google Drive

Distributed Message Queue

ETA and Live Location Sharing

Distributed Key-Value Store

Distributed Stream Processing System

Payment Processing System

Distributed Job Scheduler

WhatsApp / Messenger

Payment Wallet at Global Scale

Uber / Ride Sharing

Web Search Engine

Globally Distributed SQL Database

Real-Time Analytics System

Recommendation System (Netflix)

Multi-Region Disaster Recovery System

Time-Series Database

Fraud Detection System

Design Live Comments Feature

Summary

Key Takeaways

Core Problem

The Hard Part

Scaling Axis

Problem Statement

What to say first

Premium Content