Design Walkthrough
Problem Statement
The Question: Design a system that shows real-time viewer counts on pages, like Booking.com showing "15 people are looking at this property right now."
This feature serves multiple purposes: - Creates urgency - Fear of missing out drives conversions - Social proof - Popular items seem more desirable - Trust signal - Active site feels more legitimate - Inventory pressure - Combined with "only 2 rooms left" messaging
What to say first
This is a distributed presence tracking problem. Before designing, I need to understand the scale, accuracy requirements, and what constitutes an active viewer.
Hidden requirements interviewers test: - Can you handle users who leave without explicit disconnect? - How do you scale WebSocket connections? - What happens with extremely popular pages (hot partitions)? - How do you balance accuracy vs performance?
Clarifying Questions
Ask these questions to shape your architecture decisions.
Question 1: Scale
How many concurrent users and how many distinct pages are we tracking?
Why this matters: Determines if we can use simple in-memory solution or need distributed architecture. Typical answer: 10M concurrent users, 5M distinct pages Architecture impact: Need distributed presence servers, cannot fit all state in one machine
Question 2: Active Definition
What defines an active viewer? Page open? Recent interaction? How long until they are considered gone?
Why this matters: Affects heartbeat frequency and timeout logic. Typical answer: Active = page open with heartbeat in last 30 seconds Architecture impact: Need heartbeat mechanism, session timeout handling
Question 3: Update Frequency
How real-time does the count need to be? Instant? Within seconds?
Why this matters: True real-time requires WebSockets; near-real-time can use polling. Typical answer: Updates within 2-3 seconds acceptable Architecture impact: Can batch updates, reduces WebSocket message volume
Question 4: Accuracy Requirements
Does the count need to be exact or can it be approximate?
Why this matters: Exact counting at scale is expensive. Typical answer: Approximate is fine, within 10% accuracy Architecture impact: Can use probabilistic counting, sampling for hot pages
Stating assumptions
I will assume: 10M concurrent users, 5M pages, 30-second activity timeout, updates within 2 seconds, approximate counts acceptable. Most pages have 0-10 viewers, some hot pages have thousands.
The Hard Part
Say this out loud
The hard part here is detecting when users leave. Unlike chat where users log out, viewers just close tabs or lose connection. We must infer departure through timeouts.
Why this is genuinely hard:
- 1.No Explicit Departure: Users close browser tabs, phone goes to sleep, network dies. No goodbye message is sent.
- 2.Heartbeat Overhead: Sending heartbeats from millions of clients every few seconds creates massive traffic.
- 3.Hot Pages: A viral listing might have 100,000 concurrent viewers. Broadcasting every join/leave would overwhelm clients.
- 4.Distributed Counting: With users connecting to different servers, how do we get accurate global counts?
- 5.Timeout Coordination: If heartbeat timeout is 30 seconds, we need distributed timer management for millions of sessions.
Common mistake
Candidates often design for explicit join/leave events and forget the implicit departure problem. Always ask: what happens when a user just closes their browser?
The fundamental tradeoff:
Shorter heartbeat interval means: - More accurate counts (faster detection of departed users) - More server load (more heartbeats to process) - More battery drain on mobile
Longer heartbeat interval means: - Less accurate counts (ghost viewers linger) - Lower server load - Better mobile experience
10M concurrent users
30-second heartbeat interval
= 10M / 30 = 333K heartbeats/second
10-second heartbeat interval
= 10M / 10 = 1M heartbeats/second (3x more load)
Trade: 3x server cost for 20 seconds faster ghost detectionScale and Access Patterns
Let me estimate the scale and understand access patterns.
| Dimension | Value | Impact |
|---|---|---|
| Concurrent users | 10,000,000 | Need distributed WebSocket servers |
| Distinct pages | 5,000,000 | Cannot track all in single machine |
What to say
The distribution is highly skewed - most pages have 0-2 viewers, but hot pages can have 100K+. This bimodal distribution affects our design significantly.
Access Pattern Analysis:
- Write-heavy for heartbeats: Every active user sends heartbeat every 30s - Read on page load: User joins page, needs current count immediately - Subscription for updates: User wants count updates while on page - Bursty hot pages: Viral content creates sudden spikes - Long-tail cold pages: Most pages have 0 viewers most of the time
Memory for session tracking:
- 10M sessions x 100 bytes = 1 GB (fits in memory)
High-Level Architecture
Let me design the system in layers.
What to say
I will separate connection handling from presence tracking. WebSocket servers handle connections, presence servers track who is where, and we use pub/sub to fan out updates.
Real-time Viewers Architecture
Component Responsibilities:
1. WebSocket Servers - Maintain persistent connections with browsers - Handle heartbeats from clients - Subscribe to updates for pages their clients are viewing - Push count updates to connected clients
2. Presence Servers - Track which users are on which pages - Process heartbeats and manage timeouts - Calculate aggregated counts per page - Publish count changes to pub/sub
3. Redis Cluster - Store session state (user -> page mapping) - Store page counts (page -> viewer count) - Handle TTL-based session expiry
4. Pub/Sub Layer - Fan out count updates to relevant WebSocket servers - Decouple presence calculation from delivery
Why separate WebSocket and Presence servers?
WebSocket servers are stateful (hold connections) but presence logic is stateless. Separating them allows independent scaling - add WS servers for more connections, add presence servers for more processing.
Data Model and Storage
Redis is ideal for this use case due to: - In-memory speed for real-time operations - TTL support for automatic session expiry - Sorted sets for efficient counting - Pub/sub for broadcasting updates
What to say
I will use Redis sorted sets to track viewers per page. The score is the heartbeat timestamp, allowing efficient removal of expired sessions.
# Track viewers per page using sorted sets
# Key: viewers:{page_id}
# Members: session_idHeartbeat Processing Flow:
def handle_heartbeat(session_id: str, page_id: str):
now = time.time()
pipe = redis.pipeline()Efficient Count Retrieval:
def get_viewer_count(page_id: str) -> int:
now = time.time()
cutoff = now - 30 # 30 second timeoutImportant optimization
Do not run cleanup on every request. Run it periodically (every few seconds) or lazily when count is requested. Cleanup is O(n) for removed items.
Real-time Update Distribution
Getting count updates to all viewers efficiently is critical. We cannot broadcast every join/leave to every viewer.
What to say
I will use a pub/sub pattern with batching. Instead of broadcasting every event, we batch updates and send count changes every 1-2 seconds.
Update Distribution Flow
Batching Strategy:
Instead of publishing on every heartbeat:
class CountUpdateBatcher:
def __init__(self):
self.pending_updates = {} # page_id -> latest_countWebSocket Server Subscription:
class WebSocketServer:
def __init__(self):
# page_id -> set of WebSocket connectionsHandling Hot Pages
Say this proactively
Hot pages with 100K+ viewers need special handling. We cannot process 100K heartbeats per 30 seconds for a single page without overwhelming one Redis key.
Hot Page Problems:
- 1.Single key bottleneck: All 100K sessions updating one sorted set 2. Pub/sub fan-out: Broadcasting to 100K connections on every change 3. Count accuracy: With so many joins/leaves, count is constantly changing
Solution 1: Sampling for Hot Pages
def handle_heartbeat_with_sampling(session_id: str, page_id: str):
# Check if this is a hot page
is_hot = redis.get(f"hot:{page_id}") is not NoneSolution 2: Sharded Counting
NUM_SHARDS = 10
def get_shard(session_id: str, page_id: str) -> int:Solution 3: Approximate Updates for Hot Pages
class HotPageHandler:
def __init__(self):
self.update_intervals = {Real-world approach
Booking.com likely shows rounded numbers for hot listings (15 people vs 14 people) and updates less frequently. Users do not notice if count updates every 5 seconds instead of every second for popular listings.
Consistency and Invariants
System Invariants
Displayed count must not significantly exceed actual viewers (would be misleading). Count can lag behind reality by a few seconds (acceptable).
Why eventual consistency is acceptable:
- Users do not verify counts independently - Approximate social proof is still effective - Exact counts are impossible anyway (users leave between count and display) - Business goal is urgency, not auditing
| Scenario | User Experience | Acceptable? |
|---|---|---|
| Count shows 10, actual is 12 | Slightly understated | Yes - users who just joined |
| Count shows 12, actual is 10 | Slightly overstated | Borderline - users who just left |
| Count shows 50, actual is 10 | Significantly overstated | No - misleading |
| Count updates 3s late | Slight lag | Yes - unnoticeable |
| Count stuck for 30s | Stale data | No - defeats real-time purpose |
Sources of Inconsistency:
- 1.Heartbeat delay: User active but heartbeat not processed yet 2. Timeout delay: User left but session not expired yet 3. Broadcast delay: Count changed but update not sent yet 4. Network delay: Update sent but not received yet
User closes browser tab at T=0
T=0: User gone, but system does not knowWhat to say
The system is eventually consistent with bounded staleness. Counts reflect reality within 30-35 seconds. For social proof urgency features, this is acceptable.
Failure Modes and Resilience
Proactively discuss failures
Let me walk through what happens when components fail.
| Failure | Impact | Mitigation | Why It Works |
|---|---|---|---|
| Redis down | Cannot track presence | Show cached counts, hide feature | Stale count better than error |
| WebSocket server crash | Users lose connection | Clients auto-reconnect to another server | Stateless servers, load balancer reroutes |
Graceful Degradation Strategy:
class ViewerCountService:
def __init__(self):
self.local_cache = {} # Fallback cacheClient-side Resilience:
class ViewerCountClient {
constructor() {
this.reconnectDelay = 1000;Evolution and Scaling
What to say
This design handles 10M concurrent users with single-region Redis. Let me discuss how it evolves for 100M users and global deployment.
Evolution Path:
Stage 1: Single Redis (up to 10M concurrent) - Simple architecture as designed - Single region deployment - All presence in one Redis cluster
Stage 2: Regional Deployment (10-50M concurrent) - Redis cluster per region - Regional counts shown (acceptable for most cases) - Optional: async aggregation for global counts
Stage 3: Edge Presence (50M+ concurrent) - Presence tracking at edge nodes - Local counts with periodic sync - Probabilistic counting for global aggregates
Global Architecture Evolution
Alternative Approaches:
| Approach | When to Use | Tradeoff |
|---|---|---|
| Server-Sent Events instead of WebSocket | Simpler infrastructure, one-way updates | No heartbeat channel, need separate endpoint |
| Polling instead of push | Simpler, works everywhere | Higher latency, more server load |
| HyperLogLog for counting | Extreme scale, approximate counts OK | Cannot list who is viewing, only count |
| Edge workers (Cloudflare) | Ultra-low latency, global presence | Complex state management at edge |
If requirements were different
If we needed to show WHO is viewing (not just count), the design changes significantly - we would need to store and sync user identities, handle privacy, and fan-out becomes much more expensive.
Cost Optimization at Scale:
- 1.Reduce heartbeat frequency for idle users (increase from 30s to 60s if tab not focused) 2. Sample hot pages instead of tracking every viewer 3. Use connection multiplexing - one WebSocket per browser, not per tab 4. Compress updates - send deltas, not full counts 5. Tier storage - hot pages in Redis, cold pages computed on-demand