Design Walkthrough
Problem Statement
The Question: Design a system that enforces concurrent streaming limits for a service like Netflix, where different subscription tiers allow different numbers of simultaneous streams.
Business Context: - Basic Plan: 1 simultaneous stream - Standard Plan: 2 simultaneous streams - Premium Plan: 4 simultaneous streams
Why This Matters: - Revenue Protection: Prevents password sharing abuse (one account, unlimited users) - Capacity Planning: Limits help predict infrastructure needs - Tiered Monetization: Drives upgrades to higher-tier plans - Licensing Compliance: Content licenses often specify concurrent viewer limits
What to say first
Before I design, I want to clarify: what defines an active stream? Is it when the video starts playing, or when the user opens the app? And how quickly must we detect when a stream ends?
Hidden requirements interviewers test: - Do you understand the difference between request rate limiting and session concurrency? - Can you handle the messy reality of detecting stream end (crashes, network loss)? - Do you consider the user experience when enforcing limits? - Can you design for global scale with regional edge servers?
Clarifying Questions
These questions demonstrate you understand the nuances of session management.
Question 1: Stream Definition
What counts as an active stream? Opening the app? Starting playback? Or only while video is actively playing (not paused)?
Why this matters: Affects when we increment/decrement counters. Typical answer: Active playback counts. Paused for >5 minutes does not count. Architecture impact: Need heartbeat mechanism to detect pause vs play state.
Question 2: Enforcement Strictness
When limit is reached, do we block the new stream or kick off an existing one? What about brief overlaps during device switching?
Why this matters: Determines user experience and edge case handling. Typical answer: Block new stream, show message to stop another device. Allow 30-second grace for device switching. Architecture impact: Need to track which device started first, handle race conditions gracefully.
Question 3: Detection Latency
How quickly must we detect that a stream has ended? Seconds? Minutes?
Why this matters: Determines heartbeat frequency and TTL strategy. Typical answer: Within 1-2 minutes of stream end. Architecture impact: Heartbeats every 30 seconds with 90-second TTL.
Question 4: Global Distribution
Are streams served from regional edge servers or a central location? Do we need global consistency?
Why this matters: Global CDN means session state must be coordinated across regions. Typical answer: Edge servers worldwide, central session store. Architecture impact: Edge servers report to central session service; some latency acceptable.
Stating assumptions
I will assume: stream = active playback with heartbeats every 30s, block new streams when limit reached (do not kick existing), 90-second TTL for session expiry, global edge servers with central session store.
The Hard Part
Say this out loud
The hard part here is reliably detecting when a stream ends. Starting a stream is easy - the user explicitly requests it. But ending? Users close laptops, lose WiFi, apps crash, phones die. There is no clean goodbye.
Why Stream End Detection is Genuinely Hard:
- 1.No Reliable End Signal: Unlike a web request with a response, streams often end without notification - User closes laptop lid (app suspended, no network) - Internet connection drops - App crashes - Phone battery dies - User switches to different app
- 2.False Positives Are Costly: If we incorrectly mark a stream as ended: - User briefly loses WiFi, comes back, blocked from their own stream - Creates terrible user experience - Support tickets and churn
- 3.False Negatives Enable Abuse: If we fail to detect ended streams: - Ghost sessions block legitimate new streams - Users wait for old session to timeout - Creates terrible user experience
- 4.Global Distribution Complicates: - User in Tokyo, session state in Virginia - Network partition between edge and central - Clock skew between servers
Common mistake
Candidates often design for the happy path (user clicks stop). The real challenge is handling the unhappy path where the client disappears without saying goodbye.
The Fundamental Tradeoff:
Short TTL (30s) -> Fast detection -> More false positives (brief network blips kill session)
Long TTL (5min) -> Fewer false positives -> Slow detection (ghost sessions block users)We typically choose: 30-second heartbeat, 90-second TTL as a balance. - Miss 2 heartbeats = session expired - Survives brief network hiccups - Detects ended streams within ~2 minutes
Scale & Access Patterns
Let me estimate the scale for a Netflix-sized service.
| Dimension | Value | Derivation |
|---|---|---|
| Total Subscribers | 200 Million | Netflix actual subscriber count |
| Peak Concurrent Streams | 10 Million | ~5% of subscribers streaming at peak |
What to say
At 330K heartbeats per second, this is write-heavy and requires a system optimized for high write throughput. The data is small (2GB) so it fits in memory, which is good for latency.
Access Pattern Analysis:
- Stream Start: Read current count + Write new session (must be atomic) - Heartbeat: Update last_seen timestamp (very high frequency) - Stream End: Delete session record (explicit end) or TTL expiry (implicit) - Query: Get all sessions for account (when user views active devices)
Operation Frequency Latency Requirement
--------- --------- -------------------
Stream Start 50K/sec < 500ms (user waiting)
Heartbeat 330K/sec < 100ms (background)
Stream End 50K/sec < 100ms (background)
Get Active Sessions 10K/sec < 200ms (UI display)Key Insight: Heartbeats dominate the workload. We need a system that can handle 330K writes/second, but most of these are simple timestamp updates, not complex transactions.
High-Level Architecture
Let me walk through the architecture from client to storage.
What to say
The architecture has three main components: client-side heartbeat, session service for enforcement, and a distributed session store. I will partition by account_id so all sessions for one account are colocated.
Concurrency Enforcement Architecture
Component Responsibilities:
1. Client Apps (TV, Mobile, Web) - Send heartbeat every 30 seconds while streaming - Include: account_id, device_id, stream_id, playback_state - Handle rejection gracefully (show upgrade prompt)
2. Edge Servers (CDN) - Serve video content - Forward session events to Session Service - Cache authorization decisions briefly (5s) to reduce load
3. Session Service - Stateless servers behind load balancer - Route by hash(account_id) for consistency - Enforce concurrency limits - Handle stream start/heartbeat/end
4. Redis Cluster (Session Store) - Store active sessions with TTL - Partitioned by account_id - High write throughput for heartbeats
5. Subscription Service - Returns plan limits for account - Cached aggressively (plan changes are rare)
Real-world reference
Netflix uses a similar architecture with their Zuul edge service forwarding to backend session management. Disney+ rebuilt this after launch issues showed how critical robust session management is.
Data Model & Storage
Redis is ideal for session storage: in-memory for speed, TTL for automatic expiry, and atomic operations for safe counting.
What to say
I will use Redis with sessions stored as hash maps, keyed by account_id. Each session has a TTL that auto-expires if heartbeats stop. This handles the ghost session problem automatically.
# Session key pattern
sessions:{account_id}:{device_id}
Why This Key Design:
- account_id in key: Easy to find all sessions for an account with SCAN - device_id in key: Prevents duplicate sessions from same device - Hash for session data: Atomic updates, can update single field - TTL on each key: Automatic cleanup of ghost sessions
-- KEYS[1] = sessions:{account_id}:* pattern for SCAN
-- KEYS[2] = sessions:{account_id}:{device_id} for new session
-- ARGV[1] = max_concurrent_streams (from subscription)-- KEYS[1] = sessions:{account_id}:{device_id}
-- ARGV[1] = current_timestamp
-- ARGV[2] = ttl_secondsImportant detail
The SCAN operation in Lua is not ideal for high-frequency calls. In production, maintain a separate SET of active device_ids per account for O(1) counting: active_devices:{account_id}
# Per-account device set (for fast counting)
active_devices:{account_id} = SET of device_ids
Example: {"device_tv_living", "device_phone_john"}Stream Lifecycle Deep Dive
Let me trace through the complete lifecycle of a stream, including edge cases.
Stream Lifecycle State Machine
State Transitions Explained:
1. Requesting -> Authorized/Denied
async def start_stream(account_id: str, device_id: str, content_id: str) -> StreamResult:
# 1. Get subscription limits (cached)
plan = await subscription_service.get_plan(account_id)2. Active -> Active (Heartbeat)
async def process_heartbeat(account_id: str, device_id: str, position: int) -> HeartbeatResult:
# Update session with new heartbeat
result = await redis.eval(3. Zombie -> Expired (TTL)
This is the magic that handles ungraceful shutdowns:
T+0s: User closes laptop (no goodbye signal)
T+30s: Heartbeat missed #1 (TTL still has 60s remaining)
T+60s: Heartbeat missed #2 (TTL still has 30s remaining)
T+90s: TTL expires - session automatically removed
T+91s: User on another device can now start stream
Key insight: No active cleanup job needed. Redis TTL does the work.Why not active cleanup?
A background job scanning for expired sessions would add complexity and still have race conditions. Redis TTL is atomic, distributed, and battle-tested. Let the database do the work.
Consistency & Edge Cases
System Invariants
1. Never permanently block a paying customer from their first stream. 2. Never allow significantly more than limit (N+1 briefly acceptable, 2N is not). 3. Always show user which devices are active when blocking.
Edge Case 1: Race Condition on Stream Start
Scenario: User has 2-stream limit, 1 active. Two family members click Play simultaneously.
Time Device A Device B Redis Count
---- -------- -------- -----------
T1 Read count=1 Read count=1 1Edge Case 2: Device Switching
Scenario: User watching on TV, wants to continue on phone (common use case).
async def switch_device(account_id: str, old_device: str, new_device: str, content_id: str):
"""
Special endpoint for seamless device switching.Edge Case 3: Plan Downgrade While Streaming
Scenario: User has 4 active streams, then downgrades to Basic (1 stream).
Business Decision Required
Do we: (A) Immediately kick 3 streams? (B) Let existing streams finish, enforce on next start? Most services choose B - do not interrupt active sessions, but do not allow new ones.
Edge Case 4: Network Partition
Scenario: Edge server in Tokyo loses connection to central Redis in Virginia.
async def start_stream_with_fallback(account_id: str, device_id: str) -> StreamResult:
try:
# Try central session check with timeoutWhat to say about consistency
We choose availability over strict consistency here. If Redis is down, we fail open for the first stream but are conservative when we have cached data showing the limit is reached. Brief over-limit is acceptable; blocking paying customers is not.
Failure Modes & Resilience
Proactively discuss failures
Streaming is a 24/7 service. Let me walk through what happens when things break.
| Failure | Impact | Mitigation | Recovery |
|---|---|---|---|
| Redis primary down | Cannot start new streams | Redis Sentinel failover (30s) | Automatic promotion of replica |
| Session service down | Stream starts fail | Multiple replicas + health checks | Load balancer removes unhealthy nodes |
Redis High Availability Setup:
Redis HA with Sentinel
Graceful Degradation Ladder:
- 1.Healthy: Full session tracking, accurate counts 2. Degraded: Redis slow, using cached counts, may over-allow 3. Impaired: Cannot reach Redis, fail open for new users, deny for cached at-limit 4. Emergency: Disable enforcement entirely, allow all streams
class SessionServiceCircuitBreaker:
def __init__(self):
self.failure_count = 0Evolution & Scaling
What to say
This design handles Netflix scale (200M subscribers, 10M concurrent). Let me discuss how it evolves for different requirements and even larger scale.
Current Design Limits: - 10M concurrent sessions in Redis Cluster (comfortable) - 330K heartbeats/second (Redis can handle 1M+ ops/sec) - Single region session store with global edge
Evolution 1: Multi-Region Session Store
For truly global low-latency enforcement:
Multi-Region Architecture
Tradeoff: Regional consistency with eventual global consistency. A user could start streams in two regions simultaneously and briefly exceed limit until sync propagates (~1-2 seconds).
Evolution 2: Household Detection
Netflix 2023 password sharing crackdown requires knowing if devices are in the same household:
class HouseholdDetector:
"""
Determine if devices belong to same household.Evolution 3: QoS-Based Limits
Instead of hard device limits, limit based on total bandwidth:
| Plan | Old Model | New Model |
|---|---|---|
| Basic | 1 stream | 5 Mbps total bandwidth |
| Standard | 2 streams | 15 Mbps total bandwidth |
| Premium | 4 streams | 50 Mbps total bandwidth |
Benefit: User can have 5 low-quality streams or 1 4K stream. More flexible. Challenge: Real-time bandwidth tracking and enforcement is harder than counting.
Alternative approach to mention
If I had to optimize for absolute lowest latency, I would use local enforcement at the edge with periodic sync. Each edge server tracks its local sessions and syncs counts every few seconds. This allows sub-10ms enforcement but may briefly allow N+2 or N+3 during sync gaps.