System Design Masterclass

58 items

System Design Masterclass

Streamingstreamingsession-managementdistributed-systemsreal-timeenforcementintermediate

Design Netflix Screen Concurrency Limits

Design a system to enforce streaming concurrency limits

200M+ subscribers, Millions concurrent streams|Similar to Netflix, Disney+, HBO Max, Spotify, Xbox Game Pass|45 min read

Summary

Design a system that enforces how many devices can stream simultaneously on a single account (e.g., Netflix Basic allows 1 screen, Premium allows 4). The core challenge is tracking active streams across globally distributed edge servers in real-time while not degrading the streaming experience. This is asked at streaming companies, gaming platforms, and any service with concurrent usage limits.

Key Takeaways

Core Problem

This is a distributed session counting problem with real-time enforcement. Unlike rate limiting (requests/time), we are tracking concurrent active sessions.

The Hard Part

Detecting when a stream ends is harder than detecting when it starts. Users close laptops, lose internet, or apps crash - no clean goodbye signal.

Scaling Axis

Scale by partitioning on account_id. All sessions for one account should route to the same partition for accurate counting.

The Question: Design a system that enforces concurrent streaming limits for a service like Netflix, where different subscription tiers allow different numbers of simultaneous streams.

Business Context: - Basic Plan: 1 simultaneous stream - Standard Plan: 2 simultaneous streams - Premium Plan: 4 simultaneous streams

Why This Matters: - Revenue Protection: Prevents password sharing abuse (one account, unlimited users) - Capacity Planning: Limits help predict infrastructure needs - Tiered Monetization: Drives upgrades to higher-tier plans - Licensing Compliance: Content licenses often specify concurrent viewer limits

What to say first

Before I design, I want to clarify: what defines an active stream? Is it when the video starts playing, or when the user opens the app? And how quickly must we detect when a stream ends?

Hidden requirements interviewers test: - Do you understand the difference between request rate limiting and session concurrency? - Can you handle the messy reality of detecting stream end (crashes, network loss)? - Do you consider the user experience when enforcing limits? - Can you design for global scale with regional edge servers?

Summary

Key Takeaways

Core Problem

This is a distributed session counting problem with real-time enforcement. Unlike rate limiting (requests/time), we are tracking concurrent active sessions.

The Hard Part

Detecting when a stream ends is harder than detecting when it starts. Users close laptops, lose internet, or apps crash - no clean goodbye signal.

Scaling Axis

Scale by partitioning on account_id. All sessions for one account should route to the same partition for accurate counting.

Critical Invariant

Never allow more than N+1 concurrent streams (small grace for race conditions). Never block a paying customer from their first stream.

Performance Requirement

Stream start authorization must complete in under 500ms. Heartbeat processing must handle millions of updates per second.

Key Tradeoff

We choose eventual consistency with short TTLs over strong consistency. A brief period of N+1 streams is better than blocking legitimate users.

Design Walkthrough

Problem Statement

The Question: Design a system that enforces concurrent streaming limits for a service like Netflix, where different subscription tiers allow different numbers of simultaneous streams.

Business Context: - Basic Plan: 1 simultaneous stream - Standard Plan: 2 simultaneous streams - Premium Plan: 4 simultaneous streams

What to say first

Before I design, I want to clarify: what defines an active stream? Is it when the video starts playing, or when the user opens the app? And how quickly must we detect when a stream ends?

Clarifying Questions

These questions demonstrate you understand the nuances of session management.

Question 1: Stream Definition

What counts as an active stream? Opening the app? Starting playback? Or only while video is actively playing (not paused)?

Why this matters: Affects when we increment/decrement counters. Typical answer: Active playback counts. Paused for >5 minutes does not count. Architecture impact: Need heartbeat mechanism to detect pause vs play state.

Question 2: Enforcement Strictness

When limit is reached, do we block the new stream or kick off an existing one? What about brief overlaps during device switching?

Why this matters: Determines user experience and edge case handling. Typical answer: Block new stream, show message to stop another device. Allow 30-second grace for device switching. Architecture impact: Need to track which device started first, handle race conditions gracefully.

Question 3: Detection Latency

How quickly must we detect that a stream has ended? Seconds? Minutes?

Why this matters: Determines heartbeat frequency and TTL strategy. Typical answer: Within 1-2 minutes of stream end. Architecture impact: Heartbeats every 30 seconds with 90-second TTL.

Question 4: Global Distribution

Are streams served from regional edge servers or a central location? Do we need global consistency?

Why this matters: Global CDN means session state must be coordinated across regions. Typical answer: Edge servers worldwide, central session store. Architecture impact: Edge servers report to central session service; some latency acceptable.

Stating assumptions

I will assume: stream = active playback with heartbeats every 30s, block new streams when limit reached (do not kick existing), 90-second TTL for session expiry, global edge servers with central session store.

The Hard Part

Say this out loud

The hard part here is reliably detecting when a stream ends. Starting a stream is easy - the user explicitly requests it. But ending? Users close laptops, lose WiFi, apps crash, phones die. There is no clean goodbye.

Why Stream End Detection is Genuinely Hard:

1.No Reliable End Signal: Unlike a web request with a response, streams often end without notification - User closes laptop lid (app suspended, no network) - Internet connection drops - App crashes - Phone battery dies - User switches to different app
2.False Positives Are Costly: If we incorrectly mark a stream as ended: - User briefly loses WiFi, comes back, blocked from their own stream - Creates terrible user experience - Support tickets and churn
3.False Negatives Enable Abuse: If we fail to detect ended streams: - Ghost sessions block legitimate new streams - Users wait for old session to timeout - Creates terrible user experience
4.Global Distribution Complicates: - User in Tokyo, session state in Virginia - Network partition between edge and central - Clock skew between servers

Common mistake

Candidates often design for the happy path (user clicks stop). The real challenge is handling the unhappy path where the client disappears without saying goodbye.

The Fundamental Tradeoff:

Short TTL (30s) -> Fast detection -> More false positives (brief network blips kill session)
Long TTL (5min) -> Fewer false positives -> Slow detection (ghost sessions block users)

We typically choose: 30-second heartbeat, 90-second TTL as a balance. - Miss 2 heartbeats = session expired - Survives brief network hiccups - Detects ended streams within ~2 minutes

Scale & Access Patterns

Let me estimate the scale for a Netflix-sized service.

Dimension	Value	Derivation
Total Subscribers	200 Million	Netflix actual subscriber count
Peak Concurrent Streams	10 Million	~5% of subscribers streaming at peak

+ 5 more rows...

What to say

At 330K heartbeats per second, this is write-heavy and requires a system optimized for high write throughput. The data is small (2GB) so it fits in memory, which is good for latency.

Access Pattern Analysis:

Stream Start: Read current count + Write new session (must be atomic) - Heartbeat: Update last_seen timestamp (very high frequency) - Stream End: Delete session record (explicit end) or TTL expiry (implicit) - Query: Get all sessions for account (when user views active devices)

Operation           Frequency       Latency Requirement
---------           ---------       -------------------
Stream Start        50K/sec         < 500ms (user waiting)
Heartbeat           330K/sec        < 100ms (background)
Stream End          50K/sec         < 100ms (background)
Get Active Sessions 10K/sec         < 200ms (UI display)

Key Insight: Heartbeats dominate the workload. We need a system that can handle 330K writes/second, but most of these are simple timestamp updates, not complex transactions.

High-Level Architecture

Let me walk through the architecture from client to storage.

What to say

The architecture has three main components: client-side heartbeat, session service for enforcement, and a distributed session store. I will partition by account_id so all sessions for one account are colocated.

Concurrency Enforcement Architecture

Component Responsibilities:

1. Client Apps (TV, Mobile, Web) - Send heartbeat every 30 seconds while streaming - Include: account_id, device_id, stream_id, playback_state - Handle rejection gracefully (show upgrade prompt)

2. Edge Servers (CDN) - Serve video content - Forward session events to Session Service - Cache authorization decisions briefly (5s) to reduce load

3. Session Service - Stateless servers behind load balancer - Route by hash(account_id) for consistency - Enforce concurrency limits - Handle stream start/heartbeat/end

4. Redis Cluster (Session Store) - Store active sessions with TTL - Partitioned by account_id - High write throughput for heartbeats

5. Subscription Service - Returns plan limits for account - Cached aggressively (plan changes are rare)

Real-world reference

Netflix uses a similar architecture with their Zuul edge service forwarding to backend session management. Disney+ rebuilt this after launch issues showed how critical robust session management is.

Data Model & Storage

Redis is ideal for session storage: in-memory for speed, TTL for automatic expiry, and atomic operations for safe counting.

What to say

I will use Redis with sessions stored as hash maps, keyed by account_id. Each session has a TTL that auto-expires if heartbeats stop. This handles the ghost session problem automatically.

# Session key pattern
sessions:{account_id}:{device_id}

+ 17 more lines...

Why This Key Design:

account_id in key: Easy to find all sessions for an account with SCAN - device_id in key: Prevents duplicate sessions from same device - Hash for session data: Atomic updates, can update single field - TTL on each key: Automatic cleanup of ghost sessions

-- KEYS[1] = sessions:{account_id}:* pattern for SCAN
-- KEYS[2] = sessions:{account_id}:{device_id} for new session
-- ARGV[1] = max_concurrent_streams (from subscription)

+ 32 more lines...

-- KEYS[1] = sessions:{account_id}:{device_id}
-- ARGV[1] = current_timestamp
-- ARGV[2] = ttl_seconds

+ 18 more lines...

Important detail

The SCAN operation in Lua is not ideal for high-frequency calls. In production, maintain a separate SET of active device_ids per account for O(1) counting: active_devices:{account_id}

# Per-account device set (for fast counting)
active_devices:{account_id} = SET of device_ids
  Example: {"device_tv_living", "device_phone_john"}

+ 11 more lines...

Stream Lifecycle Deep Dive

Let me trace through the complete lifecycle of a stream, including edge cases.

Stream Lifecycle State Machine

State Transitions Explained:

1. Requesting -> Authorized/Denied

async def start_stream(account_id: str, device_id: str, content_id: str) -> StreamResult:
    # 1. Get subscription limits (cached)
    plan = await subscription_service.get_plan(account_id)

+ 21 more lines...

2. Active -> Active (Heartbeat)

async def process_heartbeat(account_id: str, device_id: str, position: int) -> HeartbeatResult:
    # Update session with new heartbeat
    result = await redis.eval(

+ 11 more lines...

3. Zombie -> Expired (TTL)

This is the magic that handles ungraceful shutdowns:

T+0s:   User closes laptop (no goodbye signal)
T+30s:  Heartbeat missed #1 (TTL still has 60s remaining)
T+60s:  Heartbeat missed #2 (TTL still has 30s remaining)  
T+90s:  TTL expires - session automatically removed
T+91s:  User on another device can now start stream

Key insight: No active cleanup job needed. Redis TTL does the work.

Why not active cleanup?

A background job scanning for expired sessions would add complexity and still have race conditions. Redis TTL is atomic, distributed, and battle-tested. Let the database do the work.

Consistency & Edge Cases

System Invariants

1. Never permanently block a paying customer from their first stream. 2. Never allow significantly more than limit (N+1 briefly acceptable, 2N is not). 3. Always show user which devices are active when blocking.

Edge Case 1: Race Condition on Stream Start

Scenario: User has 2-stream limit, 1 active. Two family members click Play simultaneously.

Time    Device A            Device B            Redis Count
----    --------            --------            -----------
T1      Read count=1        Read count=1        1

+ 8 more lines...

Edge Case 2: Device Switching

Scenario: User watching on TV, wants to continue on phone (common use case).

async def switch_device(account_id: str, old_device: str, new_device: str, content_id: str):
    """
    Special endpoint for seamless device switching.

+ 23 more lines...

Edge Case 3: Plan Downgrade While Streaming

Scenario: User has 4 active streams, then downgrades to Basic (1 stream).

Business Decision Required

Do we: (A) Immediately kick 3 streams? (B) Let existing streams finish, enforce on next start? Most services choose B - do not interrupt active sessions, but do not allow new ones.

Edge Case 4: Network Partition

Scenario: Edge server in Tokyo loses connection to central Redis in Virginia.

async def start_stream_with_fallback(account_id: str, device_id: str) -> StreamResult:
    try:
        # Try central session check with timeout

+ 21 more lines...

What to say about consistency

We choose availability over strict consistency here. If Redis is down, we fail open for the first stream but are conservative when we have cached data showing the limit is reached. Brief over-limit is acceptable; blocking paying customers is not.

Failure Modes & Resilience

Proactively discuss failures

Streaming is a 24/7 service. Let me walk through what happens when things break.

Failure	Impact	Mitigation	Recovery
Redis primary down	Cannot start new streams	Redis Sentinel failover (30s)	Automatic promotion of replica
Session service down	Stream starts fail	Multiple replicas + health checks	Load balancer removes unhealthy nodes

+ 4 more rows...

Redis High Availability Setup:

Redis HA with Sentinel

Graceful Degradation Ladder:

1.Healthy: Full session tracking, accurate counts 2. Degraded: Redis slow, using cached counts, may over-allow 3. Impaired: Cannot reach Redis, fail open for new users, deny for cached at-limit 4. Emergency: Disable enforcement entirely, allow all streams

class SessionServiceCircuitBreaker:
    def __init__(self):
        self.failure_count = 0

+ 25 more lines...

Evolution & Scaling

What to say

This design handles Netflix scale (200M subscribers, 10M concurrent). Let me discuss how it evolves for different requirements and even larger scale.

Current Design Limits: - 10M concurrent sessions in Redis Cluster (comfortable) - 330K heartbeats/second (Redis can handle 1M+ ops/sec) - Single region session store with global edge

Evolution 1: Multi-Region Session Store

For truly global low-latency enforcement:

Multi-Region Architecture

Tradeoff: Regional consistency with eventual global consistency. A user could start streams in two regions simultaneously and briefly exceed limit until sync propagates (~1-2 seconds).

Evolution 2: Household Detection

Netflix 2023 password sharing crackdown requires knowing if devices are in the same household:

class HouseholdDetector:
    """
    Determine if devices belong to same household.

+ 28 more lines...

Evolution 3: QoS-Based Limits

Instead of hard device limits, limit based on total bandwidth:

Plan	Old Model	New Model
Basic	1 stream	5 Mbps total bandwidth
Standard	2 streams	15 Mbps total bandwidth
Premium	4 streams	50 Mbps total bandwidth

Benefit: User can have 5 low-quality streams or 1 4K stream. More flexible. Challenge: Real-time bandwidth tracking and enforcement is harder than counting.

Alternative approach to mention

If I had to optimize for absolute lowest latency, I would use local enforcement at the edge with periodic sync. Each edge server tracks its local sessions and syncs counts every few seconds. This allows sub-10ms enforcement but may briefly allow N+2 or N+3 during sync gaps.

Design Trade-offs

Advantages

+Strong consistency
+Simple architecture
+Easy debugging

Disadvantages

-Higher latency for distant regions
-Single point of failure without HA

When to use

Single region or when consistency matters more than latency

System Design Masterclass

Weather Application with Forecasting

URL Shortener

Live Comments Feature

API Rate Limiter

On-Call Escalation System

Hotel Booking and Reservation System

Parts Compatibility Validation

Real-time Stock Price Viewer

Top-K Rankings System

File Download and Sync Library

Real-time Active Viewers

Marketplace Features

Price Alert System

Netflix Screen Concurrency Limits

Live Reactions System

Top K Most Shared Articles

High-Profile Likes Counter

Authentication and User Login

Google Calendar

Web Crawler

News Feed

Video Streaming Platform

IoC / Dependency Injection Framework

Distributed Control Infrastructure

Notification Service

Distributed Tracing System

P2P File Transfer System

Large Data Migration to Cloud

Wire Transfer API

Large Data Sorting and Processing

Database Control Plane

Distributed Metrics Logging and Aggregation

Ads Management & Delivery System

Flash Sale Backend

Photo Sharing Platform

Cluster Health Monitoring System

Rider Matching System

Surge Pricing System

Collaborative Editing System

Server Metrics Collection System

User Analytics Dashboard & Event Pipeline

Dropbox / Google Drive

Distributed Message Queue

ETA and Live Location Sharing

Distributed Key-Value Store

Distributed Stream Processing System

Payment Processing System

Distributed Job Scheduler

WhatsApp / Messenger

Payment Wallet at Global Scale

Uber / Ride Sharing

Web Search Engine

Globally Distributed SQL Database

Real-Time Analytics System

Recommendation System (Netflix)

Multi-Region Disaster Recovery System

Time-Series Database

Fraud Detection System

Design Netflix Screen Concurrency Limits

Summary

Key Takeaways

Core Problem

The Hard Part

Scaling Axis

Problem Statement

What to say first

Premium Content