Design Walkthrough
Problem Statement
The Question: Design a live comments system for a live streaming platform. When someone comments on a live stream, all viewers should see it in real-time.
Core Requirements: - Real-time delivery: Comments appear within 1-2 seconds for all viewers - Ordering: Comments appear in the same order for all viewers - Scale: Support streams with 100K+ concurrent viewers - Reliability: No lost comments during normal operation
What to say first
This is a real-time fan-out problem. Let me clarify the scale, consistency requirements, and what happens during viral moments before designing.
What interviewers are testing: - Do you understand WebSocket vs polling vs SSE tradeoffs? - Can you handle the hot partition problem (viral streams)? - Do you know pub/sub patterns? - Can you reason about ordering and consistency in distributed systems?
Clarifying Questions
These questions shape fundamental architecture decisions.
Question 1: Scale per stream
What is the maximum number of concurrent viewers for a single stream? What about total platform concurrent viewers?
Why this matters: Determines fan-out strategy. 1K viewers vs 1M viewers requires different architectures. Typical answer: Up to 500K concurrent per viral stream, 10M total platform Architecture impact: Need hierarchical fan-out, cannot push directly to all connections
Question 2: Comment rate
How many comments per second during peak on a popular stream?
Why this matters: Determines if we need throttling/sampling. Typical answer: 1000-5000 comments/second on viral streams Architecture impact: May need to sample/aggregate comments for extremely active streams
Question 3: Ordering strictness
Must all viewers see comments in exactly the same order? What about slight delays between viewers?
Why this matters: Strong ordering requires coordination which adds latency. Typical answer: Same order for all, but 100ms difference in arrival time is OK Architecture impact: Can use single sequencer per stream without distributed consensus
Question 4: Persistence requirements
Do we need to store comments permanently? Can viewers joining late see past comments?
Why this matters: Affects storage design and replay capability. Typical answer: Store for replay, show last 100 comments to new joiners Architecture impact: Need persistent storage + recent comment cache
Stating assumptions
I will assume: 500K max viewers per stream, 1000 comments/sec peak, causal ordering (all see same order), persist comments with replay for late joiners.
The Hard Part
Say this out loud
The hard part here is fan-out to massive audiences. When a viral stream has 500K viewers, every single comment must be delivered to 500K connections within 2 seconds.
Why this is genuinely hard:
- 1.Hot Partition Problem: A viral stream concentrates all traffic on one logical partition. You cannot shard by stream ID when one stream is the problem.
- 2.Connection State: 500K persistent WebSocket connections require significant memory. Each connection is ~10KB of state = 5GB just for one stream.
- 3.Fan-out Amplification: 1000 comments/sec x 500K viewers = 500M messages/sec that need to be delivered.
- 4.Ordering at Scale: All 500K viewers must see the same order, but are connected to different servers.
Viral stream scenario:
- Viewers: 500,000
- Comments/sec: 1,000
- Messages to deliver: 500,000 x 1,000 = 500,000,000/sec
If each WebSocket server handles 50K connections:
- Servers needed: 500,000 / 50,000 = 10 servers
- Each server delivers: 50K x 1000 = 50M messages/sec
This is why direct fan-out does not scale!Common mistake
Candidates often design for average case (1K viewers) and forget that viral streams exist. Always design for the hot partition scenario.
The solution preview: Hierarchical fan-out with edge servers. Instead of one service pushing to 500K connections, use a tree structure where each level only fans out to ~100 children.
Scale & Access Patterns
Let me estimate the scale and understand access patterns.
| Dimension | Value | Impact |
|---|---|---|
| Max viewers per stream | 500,000 | Hot partition problem - cannot shard by stream |
| Total concurrent viewers | 10,000,000 | Need distributed WebSocket infrastructure |
What to say
The access pattern is extremely write-heavy from a fan-out perspective. One write (comment) triggers 500K reads (deliveries). This is the opposite of typical read-heavy systems.
Access Pattern Analysis:
- Bursty traffic: Viral moments cause 100x normal load instantly - Geographic distribution: Viewers are globally distributed - Connection lifetime: Minutes to hours (duration of stream) - Read pattern: All viewers read same comments (perfect for caching/broadcast) - Write pattern: Comments come from anywhere, need single ordering point
WebSocket server capacity:
- RAM: 64GB server, 10KB per connection = 6.4M connections max
- But CPU bound at ~50-100K active connections with message processingHigh-Level Architecture
The key insight is hierarchical fan-out. Instead of one service pushing to all viewers, we create a tree of fan-out.
What to say
I will design a hierarchical fan-out architecture. Comments flow through: Ingestion -> Sequencer -> Pub/Sub -> Edge Servers -> Clients. Each layer only fans out to ~100 children.
Live Comments Architecture
Component Responsibilities:
1. Comment API (Ingestion) - Receives comment from client - Validates content (spam, length, auth) - Forwards to sequencer for ordering
2. Sequencer (Per-Stream) - Assigns sequence number to each comment - Single point of ordering per stream - Writes to storage and publishes to pub/sub
3. Pub/Sub Layer (Redis/Kafka) - Distributes comments to all edge servers - Each stream is a channel/topic - Edge servers subscribe to streams they have viewers for
4. Edge Servers (WebSocket) - Maintain persistent connections with viewers - Subscribe to pub/sub for relevant streams - Push comments to connected clients
5. Storage - Persistent storage for replay/history - Recent comment cache for new joiners
Real-world reference
This is similar to how Twitch and YouTube Live work. They use hierarchical edge distribution. Discord uses a similar pattern for large servers.
Data Model & Storage
We need two storage systems: persistent storage for history and a fast cache for real-time delivery.
What to say
The source of truth is the comments database, but real-time delivery uses pub/sub. New joiners get recent comments from cache, then subscribe for live updates.
CREATE TABLE comments (
id BIGINT PRIMARY KEY,
stream_id VARCHAR(64) NOT NULL,Redis for Real-time:
# Pub/Sub channel per stream
CHANNEL = f"comments:{stream_id}"
Sequencer Implementation:
class StreamSequencer:
def __init__(self, stream_id: str):
self.stream_id = stream_idImportant detail
We publish to real-time before confirming database write. This prioritizes latency over durability. For critical messages, you might want the opposite.
Real-time Delivery Deep Dive
Let me dive deep into the real-time delivery layer - this is the most complex part.
| Protocol | Pros | Cons | Best For |
|---|---|---|---|
| WebSocket | Bidirectional, low latency, efficient | Connection management complexity | Interactive real-time (our case) |
| Server-Sent Events | Simple, HTTP-based, auto-reconnect | Unidirectional only | Read-only streams |
| Long Polling | Works everywhere, simple | Higher latency, more overhead | Fallback only |
| HTTP/2 Push | No extra connection | Limited browser support | Not recommended |
Protocol choice
WebSocket is the right choice here. We need bidirectional (viewers can also comment), low latency, and efficient use of connections for long-lived streams.
Edge Server Implementation:
class EdgeServer:
def __init__(self):
# stream_id -> set of websocket connectionsMessage Flow
Connection management
Each edge server tracks which streams have local subscribers. It only subscribes to pub/sub channels for streams with active viewers. This prevents unnecessary message processing.
Consistency & Invariants
System Invariants
All viewers must see comments in the same order. Out-of-order comments break conversation flow and confuse users.
How we guarantee ordering:
The key is a single sequencer per stream. All comments for a stream go through one sequencer that assigns monotonically increasing sequence numbers.
Comment A arrives at T=0 -> Sequencer assigns seq=1
Comment B arrives at T=0.1 -> Sequencer assigns seq=2
Comment C arrives at T=0.2 -> Sequencer assigns seq=3
All viewers receive: A(1), B(2), C(3) in that order
Even if network delays cause C to arrive before B at some edge:
- Edge server buffers and reorders by sequence number
- Or accepts slight out-of-order (C, B) if within toleranceSingle sequencer = single point of failure?
Yes, but it is per-stream. One stream failing does not affect others. For extreme HA, use a replicated log (Kafka) as the sequencer with exactly-once semantics.
What about different arrival times at viewers?
Two viewers might see comment #5 at different times: - Viewer A (close to edge server): sees at T+100ms - Viewer B (far from edge): sees at T+500ms
This is acceptable because: - They still see the same order - The delay is usually imperceptible - Perfect synchronization would require distributed consensus (too slow)
| Consistency Level | How Achieved | Latency Cost |
|---|---|---|
| Same order for all | Single sequencer per stream | ~10ms at sequencer |
| Exactly same time | Would need distributed clock sync | Not practical |
| No duplicates | Dedup by comment ID at edge | Minimal |
| No gaps | Sequence numbers, client requests missing | Adds complexity |
What to say
We guarantee causal consistency - all viewers see the same order. We do not guarantee simultaneous delivery (that would require distributed consensus which is too slow for real-time chat).
Failure Modes & Resilience
Proactively discuss failures
Let me walk through what happens when components fail.
| Failure | Impact | Mitigation | Recovery |
|---|---|---|---|
| Edge server crash | 5K viewers disconnected | Client auto-reconnect to different edge | Reconnect + fetch missed comments by seq# |
| Sequencer down | New comments for that stream fail | Standby sequencer takes over | Clients retry, may see brief delay |
| Redis Pub/Sub down | Comments not delivered real-time | Fall back to polling recent cache | Restore pub/sub, clients catch up |
| Database slow | Comments still delivered (async persist) | Real-time unaffected, history delayed | Process backlog when DB recovers |
| Network partition | Some edges isolated | Isolated edges serve from local cache | Reconcile when partition heals |
Client-side resilience:
The client is crucial for reliability:
class LiveCommentsClient {
constructor(streamId) {
this.streamId = streamId;Gap detection
If client sees seq 5, then seq 8, it knows it missed 6 and 7. It can request these specific comments from the API. This makes the system self-healing.
Graceful degradation for viral streams:
When a stream goes viral and exceeds capacity:
async def maybe_sample_comment(stream_id: str, comment: dict) -> bool:
"""Return True if comment should be delivered"""
Evolution & Scaling
What to say
This design works well for streams up to 500K viewers with single-region deployment. Let me discuss evolution for 10M+ concurrent and global deployment.
Evolution Path:
Stage 1: Single Region (up to 500K per stream) - Single Redis cluster for pub/sub - Edge servers in one region - Works for most streams
Stage 2: Multi-Region (up to 2M per stream) - Redis cluster per region - Cross-region replication with Kafka - Edge servers globally distributed
Stage 3: Extreme Scale (10M+ per stream) - Custom pub/sub with hierarchical fan-out - CDN integration for edge delivery - Comment aggregation/sampling
Multi-Region Architecture
Global latency optimization:
- Comments are sequenced centrally (one region) - But delivered via regional edge servers - Viewers connect to nearest edge - Cross-region latency only affects the sequencer hop
| Viewer Location | Sequencer in US | Total Latency |
|---|---|---|
| US West | ~20ms to sequencer | ~100ms end-to-end |
| US East | ~40ms to sequencer | ~150ms end-to-end |
| EU | ~100ms to sequencer | ~300ms end-to-end |
| Asia | ~200ms to sequencer | ~500ms end-to-end |
Alternative approach
For truly global streams, you could use regional sequencers with vector clocks for ordering. This is more complex but reduces cross-region latency. Only worth it for the top 0.1% of streams.
What I would do differently for...
Gaming/Esports (ultra-low latency): Use UDP-based protocol, accept some packet loss, optimize for <100ms
Moderated comments (news): Add moderation queue before sequencer, accept higher latency for quality
Reactions only (no text): Use aggregated counters instead of individual messages, much simpler