Design Walkthrough
Problem Statement
The Question: Design a system that identifies the top K most shared articles across different time windows (last hour, last day, last week) with millions of share events per day.
This is a classic problem seen at: - Twitter: Trending topics and tweets - Reddit: Hot posts algorithm - Hacker News: Front page ranking - Medium: Popular articles - News sites: Most read stories
What to say first
Before I design, I want to clarify: What are the time windows we need to support? How fresh must the results be? Is approximate counting acceptable for very popular articles?
What interviewers are testing:
- Do you understand the memory challenges of counting millions of items? - Can you apply streaming algorithms (Count-Min Sketch, Heavy Hitters)? - How do you handle time-windowed aggregation? - Can you design for both real-time updates and fast queries?
Clarifying Questions
Ask these questions to shape your architecture. Each answer has significant design implications.
Question 1: Scale
How many share events per day? How many unique articles are we tracking?
Why this matters: Determines if exact counting is feasible or if we need approximate algorithms. Typical answer: 10M shares/day, 1M unique articles Impact: At this scale, storing exact counts for all articles is feasible (~100MB), but we will design for 10x growth
Question 2: Time Windows
What time windows do we need? Last hour? Day? Week? Custom ranges?
Why this matters: Multiple windows require different aggregation strategies. Typical answer: Last 1 hour, 24 hours, 7 days, 30 days Impact: Need to maintain counts at different granularities (minute buckets for hourly, hour buckets for daily)
Question 3: Freshness
How fresh must results be? Real-time or is 1-minute delay acceptable?
Why this matters: Real-time requires streaming; batch is simpler. Typical answer: Within 1 minute of actual shares Impact: Need streaming aggregation, cannot rely on batch processing alone
Question 4: K value
What is K? Top 10? Top 100? Top 1000?
Why this matters: Larger K requires more memory for candidate tracking. Typical answer: Top 100, but support up to 1000 Impact: We need to track maybe 10x candidates (top 10K) to ensure we have accurate top 1K
State your assumptions
I will assume: 10M shares/day (100K/second peak), 1M articles, time windows of 1h/24h/7d, results fresh within 1 minute, top 100 with support for top 1000.
The Hard Part
Say this out loud
The hard part is maintaining accurate counts for potentially millions of articles while using bounded memory, and supporting multiple overlapping time windows efficiently.
Why this is genuinely hard:
1. Memory vs Accuracy Tradeoff - Exact counting of 1M articles = ~100MB (feasible) - But at 100M articles = 1GB+ just for counts - We need bounded memory regardless of article count
2. Multiple Time Windows - Cannot just keep one counter per article - Need to know counts for last hour AND last day AND last week - Naive approach: separate counters per window = 3x memory
3. Time Decay - Old shares should fall off the window - Keeping all timestamps = unbounded memory - Need efficient windowing without storing every event
4. Hot Article Spikes - Viral article might get 100K shares in minutes - Must handle uneven distribution - Most articles get 0-10 shares, few get millions
Common mistake
Candidates often propose a simple hash map of article_id to count. This works for small scale but fails to address time windows and memory bounds.
The key insight:
We do not need exact counts for ALL articles. We only need accurate counts for the TOP articles. An article with 5 shares will never be in top 100 if another has 5000.
This leads us to Heavy Hitters algorithms that focus memory on high-count items.
Scale and Access Patterns
Let me quantify the scale and understand our access patterns.
| Dimension | Value | Impact |
|---|---|---|
| Shares per day | 10 million | ~115 events/second average, 1000/sec peak |
| Unique articles | 1 million | Long tail distribution - most have few shares |
What to say
This is a read-heavy system with bursty writes. Queries are frequent (every page load), writes are spiky (viral content). The distribution follows power law - 1% of articles get 50% of shares.
Access Pattern Analysis:
Writes (Share Events): - Bursty: viral content causes spikes - Append-only: shares are immutable events - High volume: 100K+/second at peak
Reads (Top-K Queries): - Very frequent: every feed load - Same query by many users - Perfect for caching
Distribution: - Power law: few articles dominate - Long tail: most articles get minimal shares - Temporal locality: popular now likely popular soon
Memory for exact counting:
- 1M articles x (8 byte ID + 4 byte count) = 12 MB per window
- 4 windows = 48 MB (feasible for single node)High-Level Architecture
Let me start with a simple design and evolve it.
What to say
I will design a streaming aggregation pipeline. Share events flow through Kafka, are aggregated by a streaming processor, and results are cached in Redis for fast queries.
Top-K Articles Architecture
Component Responsibilities:
1. Share API - Receives share events from clients - Validates and publishes to Kafka - Returns immediately (async processing)
2. Kafka - Durable event log - Partitioned by article_id for ordering - Enables replay for reprocessing
3. Stream Processors (Flink/Spark Streaming) - Consumes from Kafka partitions - Maintains windowed counts - Updates Redis with current top-k
4. Redis - Stores current top-k per time window - Sorted sets for efficient top-k retrieval - Sub-millisecond query latency
5. TimescaleDB - Historical time-series data - Minute/hour granularity buckets - For analytics and recomputation
6. Query Service - Serves top-k requests - Reads from Redis (hot path) - Falls back to recomputation if needed
Real-world reference
Twitter Trending uses a similar architecture with Storm/Heron for stream processing and Redis for serving. Reddit Hot uses a time-decay formula computed periodically.
Data Model and Storage
We need two storage patterns: real-time (Redis) and historical (time-series DB).
What to say
Redis sorted sets are perfect for top-k. The score is share count, and ZREVRANGE gives us top-k in O(log N + K) time.
# Sorted Sets for Top-K per window
Key: top_articles:{window}
Value: Sorted Set with article_id as member, count as scoreRedis Operations:
# Increment share count (called by stream processor)
def record_share(article_id: str, window: str):
# Increment in sorted set - O(log N)Time-Series Storage (for historical analysis):
-- Hypertable for share events (auto-partitioned by time)
CREATE TABLE share_events (
time TIMESTAMPTZ NOT NULL,Algorithm Deep Dive
For top-k problems, we have several algorithmic approaches. Let me explain when to use each.
| Algorithm | Memory | Accuracy | Best For |
|---|---|---|---|
| Exact Counting (HashMap) | O(n) | Perfect | Small item count (<1M) |
| Min-Heap of size K | O(k) | Perfect for top-k | When you only need top-k |
| Count-Min Sketch | O(w*d) | Approximate | Very large item count, bounded memory |
| Space-Saving/Heavy Hitters | O(k) | Approximate top-k | Finding frequent items |
| Lossy Counting | O(1/e) | Approximate | Streaming with error bound |
Approach 1: Exact Counting with Min-Heap
For our scale (1M articles), exact counting is feasible.
import heapq
from collections import defaultdict
Approach 2: Count-Min Sketch for Approximate Counting
When exact counting uses too much memory, use probabilistic counting.
Count-Min Sketch Structure
import mmh3 # MurmurHash3
import numpy as np
Approach 3: Space-Saving Algorithm (Heavy Hitters)
Best for finding top-k with bounded memory.
from collections import OrderedDict
class SpaceSaving:Which algorithm to choose?
For 1M articles: Use exact counting with Redis sorted sets. For 100M+ articles: Use Count-Min Sketch for counting combined with Space-Saving for top-k tracking.
Consistency and Invariants
System Invariants
Articles that are truly in the top-k MUST appear in our results. We can tolerate false positives (showing #101 as #100) but NOT false negatives (missing a truly popular article).
Why eventual consistency is acceptable:
For trending/popular content: - Users expect approximate, not exact rankings - Difference between #8 and #9 is meaningless to users - Freshness matters more than precision - No financial impact from slight inaccuracy
| Scenario | Acceptable? | Reason |
|---|---|---|
| Article with 5000 shares shows as 4998 | Yes | 2 share difference is noise |
| #8 and #9 articles swapped | Yes | Distinction is meaningless |
| Article with 10000 shares missing from top 10 | NO | False negative - major bug |
| Results 30 seconds stale | Yes | Freshness is relative |
| Different users see different top 10 | Yes | Eventual consistency across regions |
What to say
We optimize for availability over consistency. Users seeing slightly different trending lists is acceptable. Missing a truly viral article is not.
Consistency Model:
Write Path: At-least-once delivery - Kafka provides durable writes - Duplicate shares are acceptable (idempotent increment) - Stream processor may reprocess on failure
Read Path: Eventual consistency - Redis replicas may lag by milliseconds - CDN cache adds seconds of staleness - Acceptable for this use case
Problem: Kafka at-least-once means potential duplicate events
Option 1: Accept duplicates (simple)Failure Modes and Resilience
Proactively discuss failures
Let me walk through what happens when components fail. This system should degrade gracefully, not fail completely.
| Failure | Impact | Mitigation |
|---|---|---|
| Kafka broker down | Events queue in producers | Multi-broker cluster, producer retries |
| Stream processor crash | Counting paused | Kafka consumer groups auto-rebalance |
Handling Hot Keys (Viral Articles):
When an article goes viral, it creates a hot key problem:
class HotKeyHandler:
def __init__(self):
self.local_buffer = defaultdict(int) # Local aggregationDisaster Recovery:
Scenario: Redis data lost
Recovery options:Graceful degradation
If real-time pipeline fails, serve stale cached results. Trending from 5 minutes ago is better than no trending at all.
Evolution and Scaling
What to say
This design handles 10M shares/day well. Let me discuss how it evolves for 100x scale and additional features like personalized trending.
Evolution Path:
Stage 1: Single Region (10M shares/day) - Single Kafka cluster - One stream processor group - Redis primary with replicas - Works for most applications
Stage 2: Multi-Region (100M shares/day) - Regional Kafka clusters - Regional aggregation + global merge - Redis cluster per region - CDN for query caching
Stage 3: Personalized Trending (1B+ events) - Per-user or per-segment trending - ML-based relevance scoring - Approximate algorithms required
Multi-Region Architecture
Advanced Features:
| Feature | Approach | Complexity |
|---|---|---|
| Trending by category | Separate sorted sets per category | Low |
| Trending by location | Geo-partitioned aggregation | Medium |
| Personalized trending | User embedding similarity | High |
| Velocity-based trending | Track rate of change, not just count | Medium |
| Anti-gaming | Account age, diversity of sharers | High |
Velocity-Based Trending (Like Twitter):
Instead of just counting shares, weight by recency:
import math
def calculate_trending_score(share_count: int, age_hours: float) -> float:Alternative approach
If I needed sub-second freshness at 100x scale, I would use Apache Druid or ClickHouse for real-time OLAP instead of Redis. They handle high-cardinality aggregations natively.