System Design Masterclass

58 items

System Design Masterclass

Searchtop-kstreamingaggregationtime-windowsheapintermediate

Design Top K Most Shared Articles

Design a system to identify the K most shared articles across various time windows

10M+ shares/day, 1M+ articles, multiple time windows|Similar to Twitter, Reddit, Medium, Hacker News, Facebook, LinkedIn|45 min read

Summary

Twitter Trending, Reddit Hot, and news sites all need to surface the most popular content in real-time. The challenge is maintaining accurate counts across millions of events per day while supporting multiple time windows (last hour, day, week) with sub-second query latency. This tests streaming algorithms, approximate counting, and time-series aggregation.

Key Takeaways

Core Problem

This is a streaming aggregation problem with time-windowed counting. We need to maintain running counts and extract top-k efficiently.

The Hard Part

Memory constraints make exact counting of all articles impossible. We need approximate algorithms that guarantee finding truly popular items while using bounded memory.

Scaling Axis

We scale by time-bucketing and sharding by article ID hash. Each shard maintains its local top-k, merged at query time.

The Question: Design a system that identifies the top K most shared articles across different time windows (last hour, last day, last week) with millions of share events per day.

This is a classic problem seen at: - Twitter: Trending topics and tweets - Reddit: Hot posts algorithm - Hacker News: Front page ranking - Medium: Popular articles - News sites: Most read stories

What to say first

Before I design, I want to clarify: What are the time windows we need to support? How fresh must the results be? Is approximate counting acceptable for very popular articles?

What interviewers are testing:

Do you understand the memory challenges of counting millions of items?
Can you apply streaming algorithms (Count-Min Sketch, Heavy Hitters)?
How do you handle time-windowed aggregation?
Can you design for both real-time updates and fast queries?

Summary

Key Takeaways

Core Problem

This is a streaming aggregation problem with time-windowed counting. We need to maintain running counts and extract top-k efficiently.

The Hard Part

Memory constraints make exact counting of all articles impossible. We need approximate algorithms that guarantee finding truly popular items while using bounded memory.

Scaling Axis

We scale by time-bucketing and sharding by article ID hash. Each shard maintains its local top-k, merged at query time.

Critical Invariant

Articles in true top-k must always appear in our results. False positives (non-top items appearing) are acceptable; false negatives (missing truly popular items) are not.

Performance Requirement

Ingestion must handle 100K+ events/second. Top-k queries must return in under 100ms for any time window.

Key Tradeoff

We trade perfect accuracy for memory efficiency using probabilistic data structures. Count-Min Sketch gives us bounded error with bounded memory.

Design Walkthrough

Problem Statement

The Question: Design a system that identifies the top K most shared articles across different time windows (last hour, last day, last week) with millions of share events per day.

What to say first

Before I design, I want to clarify: What are the time windows we need to support? How fresh must the results be? Is approximate counting acceptable for very popular articles?

What interviewers are testing:

Do you understand the memory challenges of counting millions of items? - Can you apply streaming algorithms (Count-Min Sketch, Heavy Hitters)? - How do you handle time-windowed aggregation? - Can you design for both real-time updates and fast queries?

Clarifying Questions

Ask these questions to shape your architecture. Each answer has significant design implications.

Question 1: Scale

How many share events per day? How many unique articles are we tracking?

Why this matters: Determines if exact counting is feasible or if we need approximate algorithms. Typical answer: 10M shares/day, 1M unique articles Impact: At this scale, storing exact counts for all articles is feasible (~100MB), but we will design for 10x growth

Question 2: Time Windows

What time windows do we need? Last hour? Day? Week? Custom ranges?

Why this matters: Multiple windows require different aggregation strategies. Typical answer: Last 1 hour, 24 hours, 7 days, 30 days Impact: Need to maintain counts at different granularities (minute buckets for hourly, hour buckets for daily)

Question 3: Freshness

How fresh must results be? Real-time or is 1-minute delay acceptable?

Why this matters: Real-time requires streaming; batch is simpler. Typical answer: Within 1 minute of actual shares Impact: Need streaming aggregation, cannot rely on batch processing alone

Question 4: K value

What is K? Top 10? Top 100? Top 1000?

Why this matters: Larger K requires more memory for candidate tracking. Typical answer: Top 100, but support up to 1000 Impact: We need to track maybe 10x candidates (top 10K) to ensure we have accurate top 1K

State your assumptions

I will assume: 10M shares/day (100K/second peak), 1M articles, time windows of 1h/24h/7d, results fresh within 1 minute, top 100 with support for top 1000.

The Hard Part

Say this out loud

The hard part is maintaining accurate counts for potentially millions of articles while using bounded memory, and supporting multiple overlapping time windows efficiently.

Why this is genuinely hard:

1. Memory vs Accuracy Tradeoff - Exact counting of 1M articles = ~100MB (feasible) - But at 100M articles = 1GB+ just for counts - We need bounded memory regardless of article count

2. Multiple Time Windows - Cannot just keep one counter per article - Need to know counts for last hour AND last day AND last week - Naive approach: separate counters per window = 3x memory

3. Time Decay - Old shares should fall off the window - Keeping all timestamps = unbounded memory - Need efficient windowing without storing every event

4. Hot Article Spikes - Viral article might get 100K shares in minutes - Must handle uneven distribution - Most articles get 0-10 shares, few get millions

Common mistake

Candidates often propose a simple hash map of article_id to count. This works for small scale but fails to address time windows and memory bounds.

The key insight:

We do not need exact counts for ALL articles. We only need accurate counts for the TOP articles. An article with 5 shares will never be in top 100 if another has 5000.

This leads us to Heavy Hitters algorithms that focus memory on high-count items.

Scale and Access Patterns

Let me quantify the scale and understand our access patterns.

Dimension	Value	Impact
Shares per day	10 million	~115 events/second average, 1000/sec peak
Unique articles	1 million	Long tail distribution - most have few shares

+ 4 more rows...

What to say

This is a read-heavy system with bursty writes. Queries are frequent (every page load), writes are spiky (viral content). The distribution follows power law - 1% of articles get 50% of shares.

Access Pattern Analysis:

Writes (Share Events): - Bursty: viral content causes spikes - Append-only: shares are immutable events - High volume: 100K+/second at peak

Reads (Top-K Queries): - Very frequent: every feed load - Same query by many users - Perfect for caching

Distribution: - Power law: few articles dominate - Long tail: most articles get minimal shares - Temporal locality: popular now likely popular soon

Memory for exact counting:
- 1M articles x (8 byte ID + 4 byte count) = 12 MB per window
- 4 windows = 48 MB (feasible for single node)

+ 12 more lines...

High-Level Architecture

Let me start with a simple design and evolve it.

What to say

I will design a streaming aggregation pipeline. Share events flow through Kafka, are aggregated by a streaming processor, and results are cached in Redis for fast queries.

Top-K Articles Architecture

Component Responsibilities:

1. Share API - Receives share events from clients - Validates and publishes to Kafka - Returns immediately (async processing)

2. Kafka - Durable event log - Partitioned by article_id for ordering - Enables replay for reprocessing

3. Stream Processors (Flink/Spark Streaming) - Consumes from Kafka partitions - Maintains windowed counts - Updates Redis with current top-k

4. Redis - Stores current top-k per time window - Sorted sets for efficient top-k retrieval - Sub-millisecond query latency

5. TimescaleDB - Historical time-series data - Minute/hour granularity buckets - For analytics and recomputation

6. Query Service - Serves top-k requests - Reads from Redis (hot path) - Falls back to recomputation if needed

Real-world reference

Twitter Trending uses a similar architecture with Storm/Heron for stream processing and Redis for serving. Reddit Hot uses a time-decay formula computed periodically.

Data Model and Storage

We need two storage patterns: real-time (Redis) and historical (time-series DB).

What to say

Redis sorted sets are perfect for top-k. The score is share count, and ZREVRANGE gives us top-k in O(log N + K) time.

# Sorted Sets for Top-K per window
Key: top_articles:{window}
Value: Sorted Set with article_id as member, count as score

+ 13 more lines...

Redis Operations:

# Increment share count (called by stream processor)
def record_share(article_id: str, window: str):
    # Increment in sorted set - O(log N)

+ 30 more lines...

Time-Series Storage (for historical analysis):

-- Hypertable for share events (auto-partitioned by time)
CREATE TABLE share_events (
    time        TIMESTAMPTZ NOT NULL,

+ 25 more lines...

Algorithm Deep Dive

For top-k problems, we have several algorithmic approaches. Let me explain when to use each.

Algorithm	Memory	Accuracy	Best For
Exact Counting (HashMap)	O(n)	Perfect	Small item count (<1M)
Min-Heap of size K	O(k)	Perfect for top-k	When you only need top-k
Count-Min Sketch	O(w*d)	Approximate	Very large item count, bounded memory
Space-Saving/Heavy Hitters	O(k)	Approximate top-k	Finding frequent items
Lossy Counting	O(1/e)	Approximate	Streaming with error bound

Approach 1: Exact Counting with Min-Heap

For our scale (1M articles), exact counting is feasible.

import heapq
from collections import defaultdict

+ 30 more lines...

Approach 2: Count-Min Sketch for Approximate Counting

When exact counting uses too much memory, use probabilistic counting.

Count-Min Sketch Structure

import mmh3  # MurmurHash3
import numpy as np

+ 30 more lines...

Approach 3: Space-Saving Algorithm (Heavy Hitters)

Best for finding top-k with bounded memory.

from collections import OrderedDict

class SpaceSaving:

+ 37 more lines...

Which algorithm to choose?

For 1M articles: Use exact counting with Redis sorted sets. For 100M+ articles: Use Count-Min Sketch for counting combined with Space-Saving for top-k tracking.

Consistency and Invariants

System Invariants

Articles that are truly in the top-k MUST appear in our results. We can tolerate false positives (showing #101 as #100) but NOT false negatives (missing a truly popular article).

Why eventual consistency is acceptable:

For trending/popular content: - Users expect approximate, not exact rankings - Difference between #8 and #9 is meaningless to users - Freshness matters more than precision - No financial impact from slight inaccuracy

Scenario	Acceptable?	Reason
Article with 5000 shares shows as 4998	Yes	2 share difference is noise
#8 and #9 articles swapped	Yes	Distinction is meaningless
Article with 10000 shares missing from top 10	NO	False negative - major bug
Results 30 seconds stale	Yes	Freshness is relative
Different users see different top 10	Yes	Eventual consistency across regions

What to say

We optimize for availability over consistency. Users seeing slightly different trending lists is acceptable. Missing a truly viral article is not.

Consistency Model:

Write Path: At-least-once delivery - Kafka provides durable writes - Duplicate shares are acceptable (idempotent increment) - Stream processor may reprocess on failure

Read Path: Eventual consistency - Redis replicas may lag by milliseconds - CDN cache adds seconds of staleness - Acceptable for this use case

Problem: Kafka at-least-once means potential duplicate events

Option 1: Accept duplicates (simple)

+ 13 more lines...

Failure Modes and Resilience

Proactively discuss failures

Let me walk through what happens when components fail. This system should degrade gracefully, not fail completely.

Failure	Impact	Mitigation
Kafka broker down	Events queue in producers	Multi-broker cluster, producer retries
Stream processor crash	Counting paused	Kafka consumer groups auto-rebalance

+ 4 more rows...

Handling Hot Keys (Viral Articles):

When an article goes viral, it creates a hot key problem:

class HotKeyHandler:
    def __init__(self):
        self.local_buffer = defaultdict(int)  # Local aggregation

+ 25 more lines...

Disaster Recovery:

Scenario: Redis data lost

Recovery options:

+ 14 more lines...

Graceful degradation

If real-time pipeline fails, serve stale cached results. Trending from 5 minutes ago is better than no trending at all.

Evolution and Scaling

What to say

This design handles 10M shares/day well. Let me discuss how it evolves for 100x scale and additional features like personalized trending.

Evolution Path:

Stage 1: Single Region (10M shares/day) - Single Kafka cluster - One stream processor group - Redis primary with replicas - Works for most applications

Stage 2: Multi-Region (100M shares/day) - Regional Kafka clusters - Regional aggregation + global merge - Redis cluster per region - CDN for query caching

Stage 3: Personalized Trending (1B+ events) - Per-user or per-segment trending - ML-based relevance scoring - Approximate algorithms required

Multi-Region Architecture

Advanced Features:

Feature	Approach	Complexity
Trending by category	Separate sorted sets per category	Low
Trending by location	Geo-partitioned aggregation	Medium
Personalized trending	User embedding similarity	High
Velocity-based trending	Track rate of change, not just count	Medium
Anti-gaming	Account age, diversity of sharers	High

Velocity-Based Trending (Like Twitter):

Instead of just counting shares, weight by recency:

import math

def calculate_trending_score(share_count: int, age_hours: float) -> float:

+ 19 more lines...

Alternative approach

If I needed sub-second freshness at 100x scale, I would use Apache Druid or ClickHouse for real-time OLAP instead of Redis. They handle high-cardinality aggregations natively.

Design Trade-offs

Advantages

+Perfect accuracy
+Simple implementation
+O(1) updates

Disadvantages

-Memory grows with unique items
-Not bounded

When to use

Item count under 10M, memory is not a constraint

System Design Masterclass

Weather Application with Forecasting

URL Shortener

Live Comments Feature

API Rate Limiter

On-Call Escalation System

Hotel Booking and Reservation System

Parts Compatibility Validation

Real-time Stock Price Viewer

Top-K Rankings System

File Download and Sync Library

Real-time Active Viewers

Marketplace Features

Price Alert System

Netflix Screen Concurrency Limits

Live Reactions System

Top K Most Shared Articles

High-Profile Likes Counter

Authentication and User Login

Google Calendar

Web Crawler

News Feed

Video Streaming Platform

IoC / Dependency Injection Framework

Distributed Control Infrastructure

Notification Service

Distributed Tracing System

P2P File Transfer System

Large Data Migration to Cloud

Wire Transfer API

Large Data Sorting and Processing

Database Control Plane

Distributed Metrics Logging and Aggregation

Ads Management & Delivery System

Flash Sale Backend

Photo Sharing Platform

Cluster Health Monitoring System

Rider Matching System

Surge Pricing System

Collaborative Editing System

Server Metrics Collection System

User Analytics Dashboard & Event Pipeline

Dropbox / Google Drive

Distributed Message Queue

ETA and Live Location Sharing

Distributed Key-Value Store

Distributed Stream Processing System

Payment Processing System

Distributed Job Scheduler

WhatsApp / Messenger

Payment Wallet at Global Scale

Uber / Ride Sharing

Web Search Engine

Globally Distributed SQL Database

Real-Time Analytics System

Recommendation System (Netflix)

Multi-Region Disaster Recovery System

Time-Series Database

Fraud Detection System

Design Top K Most Shared Articles

Summary

Key Takeaways

Core Problem

The Hard Part

Scaling Axis

Problem Statement

What to say first

Premium Content