System Design Masterclass

58 items

System Design Masterclass

Social Medianews-feedfan-outtimelinesocial-networkcachingintermediate

Design News Feed

Design a personalized news feed showing posts from followed users

Billions of posts, millions of users online at once|Similar to Facebook, Twitter, Instagram, LinkedIn, TikTok|45 min read

Summary

A news feed shows you posts from people you follow, sorted by time or importance. The hard parts are: when someone posts, how do you quickly show it to millions of followers? Do you prepare feeds ahead of time (push) or build them when someone opens the app (pull)? How do you rank posts so interesting ones appear first? Companies like Facebook, Twitter, Instagram, and LinkedIn ask this question in interviews because it tests your understanding of scale, caching, and tradeoffs.

Key Takeaways

Core Problem

When you open the app, you want to see recent posts from hundreds of people you follow. We need to gather all those posts and show them fast - under 200 milliseconds.

The Hard Part

A celebrity with 10 million followers posts something. Do we immediately copy that post to 10 million feeds? Or do we wait and fetch it only when each person opens their app? Both have problems.

Scaling Axis

The tricky number is not total users - it is how many followers someone has. A post from someone with 100 followers is easy. A post from someone with 100 million followers is very hard.

Critical Invariant

Users should eventually see all posts from people they follow. Missing a post is bad. Showing a post twice is annoying but okay. Showing posts from unfollowed people is a bug.

Performance Requirement

The feed must load in under 500 milliseconds. New posts should appear within a few seconds. Scrolling must be smooth with no loading delays.

Key Tradeoff

Push model: prepare feeds ahead of time (fast to read, slow and expensive to write). Pull model: build feeds on demand (slow to read, cheap to write). Most systems use a mix of both.

Design Walkthrough

Problem Statement

The Question: Design a news feed system like Facebook or Twitter where users see posts from people they follow, sorted by time or relevance.

What the system needs to do (most important first):

1.Show the feed - When a user opens the app, show recent posts from everyone they follow. This is the #1 feature - it must be fast.
2.Create posts - Let users write posts with text, photos, or videos. Posts go to all their followers.
3.Follow and unfollow - Users can follow other users. When you follow someone, their posts start appearing in your feed.
4.Rank posts - Show the most interesting posts first, not just the newest. Consider likes, comments, and how close you are to the poster.
5.Real-time updates - When someone you follow posts, it should appear in your feed within seconds (not minutes).
6.Infinite scroll - Users can scroll down to see older posts. Keep loading more as they scroll.

What to say first

Let me first understand the scale of this system. How many users do we have? How many posts per day? What is the average number of followers? I will also ask about features - do we need just chronological feed or ranked feed? Do we need real-time updates or is a slight delay okay?

What the interviewer really wants to see: - Do you understand the fan-out problem? (One post going to millions of followers) - Can you explain push vs pull and when to use each? - Do you know how to use caching to make feeds fast? - Can you handle celebrities with millions of followers differently from regular users?

Clarifying Questions

Before you start designing, ask questions to understand what you are building. Good questions show the interviewer you think before you code.

Question 1: How big is this?

How many users do we have? How many posts are created per day? What is the average number of people someone follows? Are there celebrities with millions of followers?

Why ask this: The design changes completely based on scale. A system for 1 million users is different from one for 1 billion users.

What interviewers usually say: 500 million daily active users, 100 million new posts per day, average user follows 200 people, some celebrities have 100 million followers.

How this changes your design: With celebrities having 100 million followers, we cannot use a simple push model - pushing one post to 100 million feeds would take too long and use too much storage.

Question 2: Chronological or ranked feed?

Should posts be shown in order of time (newest first) or ranked by importance (most interesting first)?

Why ask this: Chronological feeds are simpler to build. Ranked feeds need machine learning and are more complex.

What interviewers usually say: Start with chronological, but design so we can add ranking later.

How this changes your design: For ranking, we need to store signals like likes, comments, and user engagement. We also need a ranking service that can score posts quickly.

Question 3: How fast should new posts appear?

When someone I follow posts, how quickly should it show in my feed? Instantly, a few seconds, or a few minutes?

Why ask this: Real-time (instant) needs WebSockets or long polling. A few seconds delay is much simpler.

What interviewers usually say: A few seconds delay is fine for most users. Real-time is nice to have but not required.

How this changes your design: We can use a simpler pull-based refresh instead of maintaining millions of real-time connections.

Question 4: What types of content?

Is this text only, or do we support photos and videos too? Do we need to support resharing (retweets)?

Why ask this: Photos and videos need a CDN and different storage. Resharing adds complexity to the data model.

What interviewers usually say: Support text, photos, and videos. Resharing is a nice-to-have.

How this changes your design: We will use a CDN for media files and store only URLs in the post data. This keeps posts small and fast to load.

Summarize your assumptions

Let me summarize: 500 million daily users, 100 million posts per day, average user follows 200 people, some users have 100 million followers. We need chronological feed first with ranking later. A few seconds delay for new posts is okay. We support text, photos, and videos.

The Hard Part

Say this to the interviewer

The hardest part of a news feed is the fan-out problem. When someone posts, we need to show it to all their followers. If a celebrity has 100 million followers, how do we update 100 million feeds quickly? This is the core challenge.

The Fan-Out Problem (explained simply):

Imagine Taylor Swift posts a photo. She has 100 million followers. Now we need to: - Either immediately add this post to 100 million different feeds (push model) - Or wait until each follower opens their app and then fetch it (pull model)

Both have problems: - Push: Writing to 100 million feeds takes time (even at 10,000 writes/second, that is 3 hours!). Also wastes storage if many followers never check their feed. - Pull: When a user opens the app, we need to check posts from everyone they follow. If they follow 500 people, that is 500 lookups before we can show the feed.

Common mistake candidates make

Many candidates say: Just use push for everyone - when someone posts, add it to all follower feeds. This breaks down for celebrities. At 100 million followers and 10,000 writes/second, one post takes 3 hours to fan out. By then, the post is old news!

The Solution: Hybrid Push-Pull (Fan-out on Write + Fan-out on Read)

We treat users differently based on their follower count:

Regular users (under 10,000 followers): Use PUSH - When they post, immediately add to all follower feeds - 10,000 writes is fast (under 1 second) - Followers see the post instantly in their pre-built feed

Celebrities (over 10,000 followers): Use PULL - When they post, just save the post - do not fan out - When a follower opens their app, we check: do I follow any celebrities? - If yes, fetch recent celebrity posts and mix them into the feed - This is slower but avoids the 100-million-write problem

Hybrid Fan-Out Strategy

Why 10,000 as the threshold?

10,000 is a common threshold because: (1) Writing to 10,000 feeds takes about 1 second - acceptable latency, (2) Most users have under 10,000 followers so most posts use the fast push model, (3) Only about 1% of users are celebrities but they create disproportionate fan-out load.

Scale and Access Patterns

Before designing, let me calculate the scale. This helps us choose the right tools and identify bottlenecks.

What we are measuring	Number	What this means for our design
Daily active users	500 million	Huge read load - need lots of caching
Posts per day	100 million	About 1,150 posts per second - manageable write load

+ 5 more rows...

What to tell the interviewer

The key insight is that reads are 50x more than writes. Users check their feed many times but post only occasionally. This means we should optimize for reading - pre-compute feeds when possible and cache aggressively.

How much space does one post need?
- Post ID, user ID, timestamp: 50 bytes
- Text content (average): 300 bytes

+ 18 more lines...

How people use the news feed (from most common to least common):

1.View their feed - Open the app and scroll through posts. This is 90% of all requests. Must be super fast.
2.Load more posts - Scroll down to see older posts. Called pagination. Should feel instant.
3.Create a post - Write something and publish it. Much less frequent than reading.
4.Like or comment - Interact with posts. These update counts that affect ranking.
5.Follow someone - Start seeing their posts. Happens rarely but changes what appears in feed.

High-Level Architecture

Now let me draw the big picture of how all the pieces fit together. I will explain what each part does and why we need it.

What to tell the interviewer

I will split this into separate services: one for creating posts, one for building feeds, one for serving feeds to users, and one for ranking. This separation lets us scale each part independently.

News Feed System - The Big Picture

What each service does and WHY it is separate:

Service	What it does	Why it is separate
Post Service	Saves new posts to database. Uploads media to CDN. Sends message to fan-out queue.	Creating posts and delivering posts to feeds are different problems. Post Service can stay simple and fast.
Fan-out Workers	Read from queue. For each new post, find followers and add post ID to their feeds.	This is CPU-intensive work that can be slow. Separate workers mean posting feels instant to the user - the fan-out happens in background.
Feed Service	When user opens app, return their pre-built feed from cache.	This must be FAST. Keeping it separate means we can add many servers just for serving feeds.
Feed Mixer	Combines pre-built feed with celebrity posts fetched on demand.	Celebrity posts are not in the pre-built feed. Mixer fetches them and merges everything together.
Ranking Service	Scores posts by relevance - considers likes, comments, recency, and user preferences.	Ranking logic is complex and changes often. Separate service lets data scientists update it without touching other code.

Common interview question: Why use a message queue?

Interviewers often ask: Why not just do fan-out directly when a post is created? Answer: If fan-out fails or is slow, we do not want the user to wait or see an error. The queue decouples posting from fan-out. User gets instant success confirmation, and workers process fan-out reliably in background.

Technology Choices - Why we picked these tools:

Post Database: MySQL or PostgreSQL - Why: Posts are structured data (user, text, time). SQL databases handle this well and are easy to query. - Partitioning: Partition by user_id so one user's posts are together. Partition by time so old posts can be archived.

Feed Cache: Redis - Why: Feed is a list of post IDs. Redis has built-in list operations (add to front, trim to size, get range). Super fast - millions of operations per second. - Structure: Each user has a list in Redis. Key = user:123:feed, Value = list of recent post IDs.

Social Graph: Separate Graph Database or MySQL - Why: Need to quickly answer "who does user X follow?" and "who follows user X?". Can use MySQL with good indexes or a graph database like Neo4j for complex queries.

Message Queue: Kafka or RabbitMQ - Why: Handles millions of messages per second. If workers fall behind, messages queue up instead of being lost. - Kafka: Better for high throughput, messages can be replayed. - RabbitMQ: Simpler, good enough for most cases.

How real companies do it

Twitter uses a mix of push and pull. Facebook uses a ranked feed with machine learning. Instagram pre-computes feeds for active users. LinkedIn uses a pull model with heavy caching. All of them use Redis or similar in-memory stores for feed caching.

Data Model and Storage

Now let me show how we organize the data. Think of tables like spreadsheets - each one stores a different type of information.

What to tell the interviewer

I will use three main storage systems: SQL database for posts and users (structured data), Redis for pre-built feeds (fast cache), and a CDN for media files (photos and videos). Each tool is best at its specific job.

Table 1: Users - Information about each person

This stores basic info about users including whether they are a celebrity (affects fan-out strategy).

Column	What it stores	Example
id	Unique ID for this user	user_123
username	Their handle	@johndoe

+ 5 more rows...

Table 2: Posts - The actual content people create

This stores every post. We partition by user_id so each user's posts are stored together.

Column	What it stores	Example
id	Unique ID for this post	post_789
user_id	Who created it	user_123

+ 7 more rows...

Why partition by user_id?

When we need to fetch celebrity posts (pull model), we ask: get recent posts from user X. Partitioning by user_id means all of user X's posts are on the same database shard - one fast query instead of asking every shard.

Table 3: Follows - Who follows whom (the social graph)

This is a simple table with two columns. Each row means: follower follows followee.

Column	What it stores	Example
follower_id	The person who clicked follow	user_456
followee_id	The person being followed	user_123
created_at	When they started following	2024-02-10

Why we need two indexes: - Index on (follower_id): Quickly find everyone that user_456 follows. Used when building their feed. - Index on (followee_id): Quickly find everyone who follows user_123. Used for fan-out when they post.

Feed Cache in Redis - The pre-built feeds

This is not a SQL table. It is stored in Redis (fast in-memory storage) as a list.

Key: feed:{user_id}
Value: List of post IDs, newest first

+ 13 more lines...

Table 4: Celebrity Follows - Which celebrities each user follows

We keep a separate small table of just celebrity follows. This makes the pull step fast.

Column	What it stores	Example
user_id	The regular user	user_456
celebrity_id	The celebrity they follow	user_taylor_swift
created_at	When they followed	2024-01-05

Why separate celebrity follows?

When loading a feed, we need to pull celebrity posts. Instead of scanning all 200 follows to find which ones are celebrities, we just look at this small table. Most users follow only a few celebrities, so this table is tiny and fast to query.

How Posting Works (Write Path)

Let me explain step by step what happens when someone creates a post. This is called the write path.

What to tell the interviewer

When a user posts, we do the minimum work synchronously (save the post, upload media) and return success. The expensive fan-out work happens asynchronously in background workers. This keeps posting fast.

What happens when you create a post

FUNCTION create_post(user_id, content, media_files):
    
    STEP 1: Upload media files to CDN (if any)

+ 23 more lines...

FUNCTION fan_out_worker():
    // This runs continuously, processing jobs from the queue

+ 20 more lines...

What about failures?

If a fan-out worker crashes halfway through, we have not updated all followers. Solution: Use a reliable queue (like Kafka) that tracks which jobs are done. If a worker crashes, another worker picks up the unfinished job and continues.

Handling large fan-outs efficiently:

Even for non-celebrities, fan-out to 10,000 followers is 10,000 Redis writes. We can batch these:

1.Group followers by which Redis server they are on 2. Send batch write to each Redis server (1 network call for 1000 writes instead of 1000 calls) 3. Use Redis pipeline (send many commands, wait for all responses at once)

How Reading Feed Works (Read Path)

Now let me explain what happens when someone opens the app to see their feed. This is the read path and it must be FAST.

What to tell the interviewer

Reading the feed has two parts: (1) Get the pre-built feed from Redis cache - this has posts from regular users, (2) Pull recent posts from celebrities the user follows. Then we merge and rank everything.

What happens when you open your feed

FUNCTION get_feed(user_id, page_number):
    // page_number 0 = first 50 posts, page 1 = next 50, etc.

+ 32 more lines...

Making it fast with caching:

The feed must load in under 500 milliseconds. Here is how we achieve that:

1.Redis is in memory - Getting the pre-built feed takes 1-2 milliseconds 2. Celebrity posts are cached - We cache recent celebrity posts in Redis too 3. Post details are cached - Full post data is cached so we rarely hit the database 4. Parallel fetches - We fetch pre-built feed and celebrity posts at the same time 5. Pre-compute ranking scores - Some ranking signals are pre-calculated, not computed on every request

FUNCTION get_post_details(post_ids):
    // Try to get from cache first

+ 15 more lines...

Cache hit rate matters a lot

If 95% of requests hit cache, we only need to handle 5% from the database. With 50,000 feed requests per second, that is only 2,500 database requests per second - very manageable. If cache hit rate drops to 80%, database load jumps to 10,000 per second - could be a problem.

Feed Ranking

Users want to see interesting posts, not just the newest ones. Let me explain how we rank posts.

What to tell the interviewer

For a basic system, we can rank by a simple formula combining recency, engagement, and relationship. For advanced ranking, companies use machine learning to predict which posts each user will engage with.

Simple ranking formula (good for interviews):

Each post gets a score. Higher score = shown earlier in feed.

Score = Base Score + Engagement Boost + Relationship Boost + Recency Boost

Base Score: All posts start at 100 points - Engagement Boost: Likes x 0.5 + Comments x 2 + Shares x 3 (comments and shares are more valuable) - Relationship Boost: +50 if poster is a close friend (you interact with them often) - Recency Boost: Posts lose 10 points per hour (older posts rank lower)

FUNCTION calculate_score(post, viewer_user_id):
    
    // Base score

+ 30 more lines...

How real companies rank (for bonus points):

Facebook, Instagram, and TikTok use machine learning:

1.Collect signals: Thousands of features - who posted, when, what type of content, viewer's past behavior 2. Train a model: Predict probability that viewer will like/comment/share this post 3. Score each post: Model outputs a score for each post for each viewer 4. A/B test: Try different ranking approaches and measure which keeps users engaged longer

You do not need to explain ML in the interview, but knowing it exists shows depth.

Ranking at scale is expensive

Calculating scores for 500 posts for 50,000 users per second = 25 million calculations per second. Solutions: (1) Pre-calculate and cache scores, update every few minutes. (2) Only rank the top 200 posts, not all 500. (3) Use approximate scores for first page, more accurate scores for later pages.

Handling Edge Cases

Tell the interviewer about edge cases

Good engineers think about what can go wrong and unusual situations. Let me walk through the tricky cases and how we handle them.

Edge Case 1: User follows/unfollows someone

When you follow someone: - Add them to your celebrity_follows table (if celebrity) or follows table - Their future posts will appear in your feed - Do we show their past posts too? Usually yes - backfill the last 10-20 posts into your feed

When you unfollow someone: - Remove from follows table - Their posts stay in your feed cache until they naturally scroll off - Or we can proactively remove them (more complex)

FUNCTION follow_user(follower_id, followee_id):
    
    STEP 1: Add to follows table

+ 23 more lines...

Edge Case 2: User becomes a celebrity (crosses 10K followers)

This is rare but needs handling: 1. Mark user as is_celebrity = true 2. Stop fanning out their new posts 3. Their old posts are already in follower feeds - that is fine 4. Future posts will be pulled instead of pushed

Edge Case 3: Post is deleted

When someone deletes their post: 1. Mark post as deleted in database (soft delete - keep the record) 2. Do NOT try to remove from all follower feeds (too expensive) 3. When rendering feed, skip posts marked as deleted 4. Eventually the post ID scrolls out of feeds naturally

Edge Case 4: Inactive users

Users who have not logged in for months: - Their Redis feed cache wastes memory - Solution: Set expiry on Redis keys (expire after 30 days of inactivity) - When they return, rebuild their feed from scratch (takes a few seconds, but saves memory for millions of inactive users)

Edge Case 5: Empty feed (new users)

New users follow nobody, so their feed is empty. Solutions: 1. Suggest popular accounts to follow 2. Show trending posts from public accounts 3. Show posts from suggested friends (based on phone contacts or mutual connections)

Interview tip

You do not need to solve all edge cases in detail. Mentioning them shows you think about real-world complexity. Say something like: There are edge cases like user becoming celebrity or deleting posts - I would handle deleted posts by soft-deleting and filtering at read time.

What Can Go Wrong and How We Handle It

Tell the interviewer about failures

Good engineers think about what can break. Let me walk through failures and our defenses.

What breaks	What happens to users	How we fix it
Redis cache goes down	Feed loading becomes very slow	Use Redis cluster with replicas. If one node dies, others take over. Also have fallback: pull posts directly from database (slower but works)
Fan-out workers are overloaded	New posts take longer to appear in feeds	Add more workers. Use autoscaling - spin up more workers when queue grows. Celebrity posts are not affected since they use pull model.

+ 4 more rows...

Preventing thundering herd (a common problem):

Imagine a celebrity with 100 million followers has their feed cache expire. Now 100 million users request their posts at the same time - the database is crushed.

Solution: Cache stampede prevention 1. Add jitter to cache expiry (random offset so not all expire at once) 2. Lock pattern: When cache misses, one request rebuilds cache while others wait 3. Warm cache proactively: Background job refreshes popular caches before they expire

FUNCTION get_celebrity_posts_with_lock(celebrity_id):
    cache_key = "celeb_posts:" + celebrity_id
    lock_key = "lock:" + cache_key

+ 19 more lines...

Monitoring is crucial

Set up alerts for: (1) Cache hit rate drops below 90%, (2) Feed load time exceeds 500ms, (3) Fan-out queue grows beyond 1 million, (4) Database CPU exceeds 70%. Catching problems early prevents outages.

Growing the System Over Time

What to tell the interviewer

This design works for a few hundred million users. Let me explain how we would scale further and add more features as the product grows.

How we grow step by step:

Stage 1: Starting out (up to 100 million users) - Single region deployment - One Redis cluster for feeds - One database cluster with read replicas - This handles most startups and mid-size companies

Stage 2: Scaling (100-500 million users) - Shard Redis by user_id (split feeds across multiple clusters) - Shard database by user_id - Add more fan-out workers - Consider multi-region for latency

Stage 3: Global scale (1 billion+ users) - Multiple data centers worldwide - Eventually consistent feeds across regions - Sophisticated caching at edge locations - Machine learning for personalized ranking

Multi-region deployment

Cool features we can add later:

1. Stories/Ephemeral content Content that disappears after 24 hours. Similar to feed but with TTL (time to live) on all data.

2. Notifications Tell users when their post gets lots of engagement, or when a close friend posts.

3. Search Search posts by keyword. Needs a separate search index (Elasticsearch).

4. Trending topics Track what is being talked about right now. Aggregate and count hashtags in real-time.

5. Ads in feed Insert sponsored posts between organic posts. Needs a separate ad-serving system.

// For users who want instant updates, use WebSocket

FUNCTION subscribe_to_feed_updates(user_id, websocket_connection):

+ 19 more lines...

Interview tip: Do not over-design

Start simple. Explain the basic push-pull hybrid first. Only add complexity (multi-region, ML ranking, real-time) if the interviewer asks about scaling further or if you have time. A clear simple design beats a confusing complex one.

Design Trade-offs

Advantages

+Feed reads are super fast - just fetch from cache
+Simple to understand and implement
+Works great when follower counts are small

Disadvantages

-Celebrity posts take forever to fan out (100M writes)
-Wastes storage - inactive users have cached feeds they never read
-High write amplification - one post becomes millions of writes

When to use

Only for systems where no user has more than 10,000 followers. Good for small communities or internal tools.

System Design Masterclass

Weather Application with Forecasting

URL Shortener

Live Comments Feature

API Rate Limiter

On-Call Escalation System

Hotel Booking and Reservation System

Parts Compatibility Validation

Real-time Stock Price Viewer

Top-K Rankings System

File Download and Sync Library

Real-time Active Viewers

Marketplace Features

Price Alert System

Netflix Screen Concurrency Limits

Live Reactions System

Top K Most Shared Articles

High-Profile Likes Counter

Authentication and User Login

Google Calendar

Web Crawler

News Feed

Video Streaming Platform

IoC / Dependency Injection Framework

Distributed Control Infrastructure

Notification Service

Distributed Tracing System

P2P File Transfer System

Large Data Migration to Cloud

Wire Transfer API

Large Data Sorting and Processing

Database Control Plane

Distributed Metrics Logging and Aggregation

Ads Management & Delivery System

Flash Sale Backend

Photo Sharing Platform

Cluster Health Monitoring System

Rider Matching System

Surge Pricing System

Collaborative Editing System

Server Metrics Collection System

User Analytics Dashboard & Event Pipeline

Dropbox / Google Drive

Distributed Message Queue

ETA and Live Location Sharing

Distributed Key-Value Store

Distributed Stream Processing System

Payment Processing System

Distributed Job Scheduler

WhatsApp / Messenger

Payment Wallet at Global Scale

Uber / Ride Sharing

Web Search Engine

Globally Distributed SQL Database

Real-Time Analytics System

Recommendation System (Netflix)

Multi-Region Disaster Recovery System

Time-Series Database

Fraud Detection System

Design News Feed

Summary

Key Takeaways

Core Problem

The Hard Part

Scaling Axis

Critical Invariant

Performance Requirement

Key Tradeoff

Design Walkthrough

Problem Statement

What to say first

Clarifying Questions

Question 1: How big is this?

Question 2: Chronological or ranked feed?

Question 3: How fast should new posts appear?

Question 4: What types of content?

Summarize your assumptions

The Hard Part

Say this to the interviewer

Common mistake candidates make