System Design Masterclass
E-Commercereal-timewebsocketspresencepub-subcountingintermediate

Design Real-time Active Viewers

Design a system showing live viewer counts like Booking.com

Millions of concurrent users, millions of pages|Similar to Booking.com, Airbnb, Amazon, Ticketmaster, StubHub|45 min read

Summary

Booking.com shows "15 people are looking at this property right now" to create urgency. The core challenge is tracking millions of concurrent user sessions across millions of pages with real-time updates. This tests presence systems, pub/sub architecture, efficient counting, and WebSocket scaling.

Key Takeaways

Core Problem

This is fundamentally a distributed presence system - tracking which users are on which pages and aggregating counts in real-time.

The Hard Part

Users do not explicitly leave pages - they close tabs, lose connection, or go idle. We must infer departure through heartbeats and timeouts.

Scaling Axis

Scale by partitioning pages across presence servers. Each server owns a shard of pages and tracks viewers for those pages.

Critical Invariant

Counts must never show significantly more viewers than reality (would erode trust). Slightly fewer is acceptable (user just left).

Performance Requirement

Count updates should reach all viewers within 1-2 seconds. Stale counts defeat the purpose of real-time urgency.

Key Tradeoff

We trade perfect accuracy for scalability by using approximate counting and batched updates instead of per-viewer broadcasts.

Design Walkthrough

Problem Statement

The Question: Design a system that shows real-time viewer counts on pages, like Booking.com showing "15 people are looking at this property right now."

This feature serves multiple purposes: - Creates urgency - Fear of missing out drives conversions - Social proof - Popular items seem more desirable - Trust signal - Active site feels more legitimate - Inventory pressure - Combined with "only 2 rooms left" messaging

What to say first

This is a distributed presence tracking problem. Before designing, I need to understand the scale, accuracy requirements, and what constitutes an active viewer.

Hidden requirements interviewers test: - Can you handle users who leave without explicit disconnect? - How do you scale WebSocket connections? - What happens with extremely popular pages (hot partitions)? - How do you balance accuracy vs performance?

Clarifying Questions

Ask these questions to shape your architecture decisions.

Question 1: Scale

How many concurrent users and how many distinct pages are we tracking?

Why this matters: Determines if we can use simple in-memory solution or need distributed architecture. Typical answer: 10M concurrent users, 5M distinct pages Architecture impact: Need distributed presence servers, cannot fit all state in one machine

Question 2: Active Definition

What defines an active viewer? Page open? Recent interaction? How long until they are considered gone?

Why this matters: Affects heartbeat frequency and timeout logic. Typical answer: Active = page open with heartbeat in last 30 seconds Architecture impact: Need heartbeat mechanism, session timeout handling

Question 3: Update Frequency

How real-time does the count need to be? Instant? Within seconds?

Why this matters: True real-time requires WebSockets; near-real-time can use polling. Typical answer: Updates within 2-3 seconds acceptable Architecture impact: Can batch updates, reduces WebSocket message volume

Question 4: Accuracy Requirements

Does the count need to be exact or can it be approximate?

Why this matters: Exact counting at scale is expensive. Typical answer: Approximate is fine, within 10% accuracy Architecture impact: Can use probabilistic counting, sampling for hot pages

Stating assumptions

I will assume: 10M concurrent users, 5M pages, 30-second activity timeout, updates within 2 seconds, approximate counts acceptable. Most pages have 0-10 viewers, some hot pages have thousands.

The Hard Part

Say this out loud

The hard part here is detecting when users leave. Unlike chat where users log out, viewers just close tabs or lose connection. We must infer departure through timeouts.

Why this is genuinely hard:

  1. 1.No Explicit Departure: Users close browser tabs, phone goes to sleep, network dies. No goodbye message is sent.
  2. 2.Heartbeat Overhead: Sending heartbeats from millions of clients every few seconds creates massive traffic.
  3. 3.Hot Pages: A viral listing might have 100,000 concurrent viewers. Broadcasting every join/leave would overwhelm clients.
  4. 4.Distributed Counting: With users connecting to different servers, how do we get accurate global counts?
  5. 5.Timeout Coordination: If heartbeat timeout is 30 seconds, we need distributed timer management for millions of sessions.

Common mistake

Candidates often design for explicit join/leave events and forget the implicit departure problem. Always ask: what happens when a user just closes their browser?

The fundamental tradeoff:

Shorter heartbeat interval means: - More accurate counts (faster detection of departed users) - More server load (more heartbeats to process) - More battery drain on mobile

Longer heartbeat interval means: - Less accurate counts (ghost viewers linger) - Lower server load - Better mobile experience

10M concurrent users
30-second heartbeat interval
= 10M / 30 = 333K heartbeats/second

10-second heartbeat interval  
= 10M / 10 = 1M heartbeats/second (3x more load)

Trade: 3x server cost for 20 seconds faster ghost detection

Scale and Access Patterns

Let me estimate the scale and understand access patterns.

DimensionValueImpact
Concurrent users10,000,000Need distributed WebSocket servers
Distinct pages5,000,000Cannot track all in single machine
+ 5 more rows...

What to say

The distribution is highly skewed - most pages have 0-2 viewers, but hot pages can have 100K+. This bimodal distribution affects our design significantly.

Access Pattern Analysis:

  • Write-heavy for heartbeats: Every active user sends heartbeat every 30s - Read on page load: User joins page, needs current count immediately - Subscription for updates: User wants count updates while on page - Bursty hot pages: Viral content creates sudden spikes - Long-tail cold pages: Most pages have 0 viewers most of the time
Memory for session tracking:
- 10M sessions x 100 bytes = 1 GB (fits in memory)
+ 9 more lines...

High-Level Architecture

Let me design the system in layers.

What to say

I will separate connection handling from presence tracking. WebSocket servers handle connections, presence servers track who is where, and we use pub/sub to fan out updates.

Real-time Viewers Architecture

Component Responsibilities:

1. WebSocket Servers - Maintain persistent connections with browsers - Handle heartbeats from clients - Subscribe to updates for pages their clients are viewing - Push count updates to connected clients

2. Presence Servers - Track which users are on which pages - Process heartbeats and manage timeouts - Calculate aggregated counts per page - Publish count changes to pub/sub

3. Redis Cluster - Store session state (user -> page mapping) - Store page counts (page -> viewer count) - Handle TTL-based session expiry

4. Pub/Sub Layer - Fan out count updates to relevant WebSocket servers - Decouple presence calculation from delivery

Why separate WebSocket and Presence servers?

WebSocket servers are stateful (hold connections) but presence logic is stateless. Separating them allows independent scaling - add WS servers for more connections, add presence servers for more processing.

Data Model and Storage

Redis is ideal for this use case due to: - In-memory speed for real-time operations - TTL support for automatic session expiry - Sorted sets for efficient counting - Pub/sub for broadcasting updates

What to say

I will use Redis sorted sets to track viewers per page. The score is the heartbeat timestamp, allowing efficient removal of expired sessions.

# Track viewers per page using sorted sets
# Key: viewers:{page_id}
# Members: session_id
+ 17 more lines...

Heartbeat Processing Flow:

def handle_heartbeat(session_id: str, page_id: str):
    now = time.time()
    pipe = redis.pipeline()
+ 21 more lines...

Efficient Count Retrieval:

def get_viewer_count(page_id: str) -> int:
    now = time.time()
    cutoff = now - 30  # 30 second timeout
+ 21 more lines...

Important optimization

Do not run cleanup on every request. Run it periodically (every few seconds) or lazily when count is requested. Cleanup is O(n) for removed items.

Real-time Update Distribution

Getting count updates to all viewers efficiently is critical. We cannot broadcast every join/leave to every viewer.

What to say

I will use a pub/sub pattern with batching. Instead of broadcasting every event, we batch updates and send count changes every 1-2 seconds.

Update Distribution Flow

Batching Strategy:

Instead of publishing on every heartbeat:

class CountUpdateBatcher:
    def __init__(self):
        self.pending_updates = {}  # page_id -> latest_count
+ 37 more lines...

WebSocket Server Subscription:

class WebSocketServer:
    def __init__(self):
        # page_id -> set of WebSocket connections
+ 30 more lines...

Handling Hot Pages

Say this proactively

Hot pages with 100K+ viewers need special handling. We cannot process 100K heartbeats per 30 seconds for a single page without overwhelming one Redis key.

Hot Page Problems:

  1. 1.Single key bottleneck: All 100K sessions updating one sorted set 2. Pub/sub fan-out: Broadcasting to 100K connections on every change 3. Count accuracy: With so many joins/leaves, count is constantly changing

Solution 1: Sampling for Hot Pages

def handle_heartbeat_with_sampling(session_id: str, page_id: str):
    # Check if this is a hot page
    is_hot = redis.get(f"hot:{page_id}") is not None
+ 19 more lines...

Solution 2: Sharded Counting

NUM_SHARDS = 10

def get_shard(session_id: str, page_id: str) -> int:
+ 18 more lines...

Solution 3: Approximate Updates for Hot Pages

class HotPageHandler:
    def __init__(self):
        self.update_intervals = {
+ 18 more lines...

Real-world approach

Booking.com likely shows rounded numbers for hot listings (15 people vs 14 people) and updates less frequently. Users do not notice if count updates every 5 seconds instead of every second for popular listings.

Consistency and Invariants

System Invariants

Displayed count must not significantly exceed actual viewers (would be misleading). Count can lag behind reality by a few seconds (acceptable).

Why eventual consistency is acceptable:

  • Users do not verify counts independently - Approximate social proof is still effective - Exact counts are impossible anyway (users leave between count and display) - Business goal is urgency, not auditing
ScenarioUser ExperienceAcceptable?
Count shows 10, actual is 12Slightly understatedYes - users who just joined
Count shows 12, actual is 10Slightly overstatedBorderline - users who just left
Count shows 50, actual is 10Significantly overstatedNo - misleading
Count updates 3s lateSlight lagYes - unnoticeable
Count stuck for 30sStale dataNo - defeats real-time purpose

Sources of Inconsistency:

  1. 1.Heartbeat delay: User active but heartbeat not processed yet 2. Timeout delay: User left but session not expired yet 3. Broadcast delay: Count changed but update not sent yet 4. Network delay: Update sent but not received yet
User closes browser tab at T=0

T=0:    User gone, but system does not know
+ 9 more lines...

What to say

The system is eventually consistent with bounded staleness. Counts reflect reality within 30-35 seconds. For social proof urgency features, this is acceptable.

Failure Modes and Resilience

Proactively discuss failures

Let me walk through what happens when components fail.

FailureImpactMitigationWhy It Works
Redis downCannot track presenceShow cached counts, hide featureStale count better than error
WebSocket server crashUsers lose connectionClients auto-reconnect to another serverStateless servers, load balancer reroutes
+ 4 more rows...

Graceful Degradation Strategy:

class ViewerCountService:
    def __init__(self):
        self.local_cache = {}  # Fallback cache
+ 25 more lines...

Client-side Resilience:

class ViewerCountClient {
  constructor() {
    this.reconnectDelay = 1000;
+ 34 more lines...

Evolution and Scaling

What to say

This design handles 10M concurrent users with single-region Redis. Let me discuss how it evolves for 100M users and global deployment.

Evolution Path:

Stage 1: Single Redis (up to 10M concurrent) - Simple architecture as designed - Single region deployment - All presence in one Redis cluster

Stage 2: Regional Deployment (10-50M concurrent) - Redis cluster per region - Regional counts shown (acceptable for most cases) - Optional: async aggregation for global counts

Stage 3: Edge Presence (50M+ concurrent) - Presence tracking at edge nodes - Local counts with periodic sync - Probabilistic counting for global aggregates

Global Architecture Evolution

Alternative Approaches:

ApproachWhen to UseTradeoff
Server-Sent Events instead of WebSocketSimpler infrastructure, one-way updatesNo heartbeat channel, need separate endpoint
Polling instead of pushSimpler, works everywhereHigher latency, more server load
HyperLogLog for countingExtreme scale, approximate counts OKCannot list who is viewing, only count
Edge workers (Cloudflare)Ultra-low latency, global presenceComplex state management at edge

If requirements were different

If we needed to show WHO is viewing (not just count), the design changes significantly - we would need to store and sync user identities, handle privacy, and fan-out becomes much more expensive.

Cost Optimization at Scale:

  1. 1.Reduce heartbeat frequency for idle users (increase from 30s to 60s if tab not focused) 2. Sample hot pages instead of tracking every viewer 3. Use connection multiplexing - one WebSocket per browser, not per tab 4. Compress updates - send deltas, not full counts 5. Tier storage - hot pages in Redis, cold pages computed on-demand

Design Trade-offs

Advantages

  • +True real-time updates
  • +Bidirectional communication
  • +Efficient for long-lived connections

Disadvantages

  • -Complex infrastructure
  • -Stateful connections harder to scale
  • -Mobile battery drain
When to use

When real-time updates are critical and users stay on page