Design Walkthrough
Problem Statement
The Question: Design a news feed system like Facebook or Twitter where users see posts from people they follow, sorted by time or relevance.
What the system needs to do (most important first):
- 1.Show the feed - When a user opens the app, show recent posts from everyone they follow. This is the #1 feature - it must be fast.
- 2.Create posts - Let users write posts with text, photos, or videos. Posts go to all their followers.
- 3.Follow and unfollow - Users can follow other users. When you follow someone, their posts start appearing in your feed.
- 4.Rank posts - Show the most interesting posts first, not just the newest. Consider likes, comments, and how close you are to the poster.
- 5.Real-time updates - When someone you follow posts, it should appear in your feed within seconds (not minutes).
- 6.Infinite scroll - Users can scroll down to see older posts. Keep loading more as they scroll.
What to say first
Let me first understand the scale of this system. How many users do we have? How many posts per day? What is the average number of followers? I will also ask about features - do we need just chronological feed or ranked feed? Do we need real-time updates or is a slight delay okay?
What the interviewer really wants to see: - Do you understand the fan-out problem? (One post going to millions of followers) - Can you explain push vs pull and when to use each? - Do you know how to use caching to make feeds fast? - Can you handle celebrities with millions of followers differently from regular users?
Clarifying Questions
Before you start designing, ask questions to understand what you are building. Good questions show the interviewer you think before you code.
Question 1: How big is this?
How many users do we have? How many posts are created per day? What is the average number of people someone follows? Are there celebrities with millions of followers?
Why ask this: The design changes completely based on scale. A system for 1 million users is different from one for 1 billion users.
What interviewers usually say: 500 million daily active users, 100 million new posts per day, average user follows 200 people, some celebrities have 100 million followers.
How this changes your design: With celebrities having 100 million followers, we cannot use a simple push model - pushing one post to 100 million feeds would take too long and use too much storage.
Question 2: Chronological or ranked feed?
Should posts be shown in order of time (newest first) or ranked by importance (most interesting first)?
Why ask this: Chronological feeds are simpler to build. Ranked feeds need machine learning and are more complex.
What interviewers usually say: Start with chronological, but design so we can add ranking later.
How this changes your design: For ranking, we need to store signals like likes, comments, and user engagement. We also need a ranking service that can score posts quickly.
Question 3: How fast should new posts appear?
When someone I follow posts, how quickly should it show in my feed? Instantly, a few seconds, or a few minutes?
Why ask this: Real-time (instant) needs WebSockets or long polling. A few seconds delay is much simpler.
What interviewers usually say: A few seconds delay is fine for most users. Real-time is nice to have but not required.
How this changes your design: We can use a simpler pull-based refresh instead of maintaining millions of real-time connections.
Question 4: What types of content?
Is this text only, or do we support photos and videos too? Do we need to support resharing (retweets)?
Why ask this: Photos and videos need a CDN and different storage. Resharing adds complexity to the data model.
What interviewers usually say: Support text, photos, and videos. Resharing is a nice-to-have.
How this changes your design: We will use a CDN for media files and store only URLs in the post data. This keeps posts small and fast to load.
Summarize your assumptions
Let me summarize: 500 million daily users, 100 million posts per day, average user follows 200 people, some users have 100 million followers. We need chronological feed first with ranking later. A few seconds delay for new posts is okay. We support text, photos, and videos.
The Hard Part
Say this to the interviewer
The hardest part of a news feed is the fan-out problem. When someone posts, we need to show it to all their followers. If a celebrity has 100 million followers, how do we update 100 million feeds quickly? This is the core challenge.
The Fan-Out Problem (explained simply):
Imagine Taylor Swift posts a photo. She has 100 million followers. Now we need to: - Either immediately add this post to 100 million different feeds (push model) - Or wait until each follower opens their app and then fetch it (pull model)
Both have problems: - Push: Writing to 100 million feeds takes time (even at 10,000 writes/second, that is 3 hours!). Also wastes storage if many followers never check their feed. - Pull: When a user opens the app, we need to check posts from everyone they follow. If they follow 500 people, that is 500 lookups before we can show the feed.
Common mistake candidates make
Many candidates say: Just use push for everyone - when someone posts, add it to all follower feeds. This breaks down for celebrities. At 100 million followers and 10,000 writes/second, one post takes 3 hours to fan out. By then, the post is old news!
The Solution: Hybrid Push-Pull (Fan-out on Write + Fan-out on Read)
We treat users differently based on their follower count:
Regular users (under 10,000 followers): Use PUSH - When they post, immediately add to all follower feeds - 10,000 writes is fast (under 1 second) - Followers see the post instantly in their pre-built feed
Celebrities (over 10,000 followers): Use PULL - When they post, just save the post - do not fan out - When a follower opens their app, we check: do I follow any celebrities? - If yes, fetch recent celebrity posts and mix them into the feed - This is slower but avoids the 100-million-write problem
Hybrid Fan-Out Strategy
Why 10,000 as the threshold?
10,000 is a common threshold because: (1) Writing to 10,000 feeds takes about 1 second - acceptable latency, (2) Most users have under 10,000 followers so most posts use the fast push model, (3) Only about 1% of users are celebrities but they create disproportionate fan-out load.
Scale and Access Patterns
Before designing, let me calculate the scale. This helps us choose the right tools and identify bottlenecks.
| What we are measuring | Number | What this means for our design |
|---|---|---|
| Daily active users | 500 million | Huge read load - need lots of caching |
| Posts per day | 100 million | About 1,150 posts per second - manageable write load |
What to tell the interviewer
The key insight is that reads are 50x more than writes. Users check their feed many times but post only occasionally. This means we should optimize for reading - pre-compute feeds when possible and cache aggressively.
How much space does one post need?
- Post ID, user ID, timestamp: 50 bytes
- Text content (average): 300 bytesHow people use the news feed (from most common to least common):
- 1.View their feed - Open the app and scroll through posts. This is 90% of all requests. Must be super fast.
- 2.Load more posts - Scroll down to see older posts. Called pagination. Should feel instant.
- 3.Create a post - Write something and publish it. Much less frequent than reading.
- 4.Like or comment - Interact with posts. These update counts that affect ranking.
- 5.Follow someone - Start seeing their posts. Happens rarely but changes what appears in feed.
High-Level Architecture
Now let me draw the big picture of how all the pieces fit together. I will explain what each part does and why we need it.
What to tell the interviewer
I will split this into separate services: one for creating posts, one for building feeds, one for serving feeds to users, and one for ranking. This separation lets us scale each part independently.
News Feed System - The Big Picture
What each service does and WHY it is separate:
| Service | What it does | Why it is separate |
|---|---|---|
| Post Service | Saves new posts to database. Uploads media to CDN. Sends message to fan-out queue. | Creating posts and delivering posts to feeds are different problems. Post Service can stay simple and fast. |
| Fan-out Workers | Read from queue. For each new post, find followers and add post ID to their feeds. | This is CPU-intensive work that can be slow. Separate workers mean posting feels instant to the user - the fan-out happens in background. |
| Feed Service | When user opens app, return their pre-built feed from cache. | This must be FAST. Keeping it separate means we can add many servers just for serving feeds. |
| Feed Mixer | Combines pre-built feed with celebrity posts fetched on demand. | Celebrity posts are not in the pre-built feed. Mixer fetches them and merges everything together. |
| Ranking Service | Scores posts by relevance - considers likes, comments, recency, and user preferences. | Ranking logic is complex and changes often. Separate service lets data scientists update it without touching other code. |
Common interview question: Why use a message queue?
Interviewers often ask: Why not just do fan-out directly when a post is created? Answer: If fan-out fails or is slow, we do not want the user to wait or see an error. The queue decouples posting from fan-out. User gets instant success confirmation, and workers process fan-out reliably in background.
Technology Choices - Why we picked these tools:
Post Database: MySQL or PostgreSQL - Why: Posts are structured data (user, text, time). SQL databases handle this well and are easy to query. - Partitioning: Partition by user_id so one user's posts are together. Partition by time so old posts can be archived.
Feed Cache: Redis - Why: Feed is a list of post IDs. Redis has built-in list operations (add to front, trim to size, get range). Super fast - millions of operations per second. - Structure: Each user has a list in Redis. Key = user:123:feed, Value = list of recent post IDs.
Social Graph: Separate Graph Database or MySQL - Why: Need to quickly answer "who does user X follow?" and "who follows user X?". Can use MySQL with good indexes or a graph database like Neo4j for complex queries.
Message Queue: Kafka or RabbitMQ - Why: Handles millions of messages per second. If workers fall behind, messages queue up instead of being lost. - Kafka: Better for high throughput, messages can be replayed. - RabbitMQ: Simpler, good enough for most cases.
How real companies do it
Twitter uses a mix of push and pull. Facebook uses a ranked feed with machine learning. Instagram pre-computes feeds for active users. LinkedIn uses a pull model with heavy caching. All of them use Redis or similar in-memory stores for feed caching.
Data Model and Storage
Now let me show how we organize the data. Think of tables like spreadsheets - each one stores a different type of information.
What to tell the interviewer
I will use three main storage systems: SQL database for posts and users (structured data), Redis for pre-built feeds (fast cache), and a CDN for media files (photos and videos). Each tool is best at its specific job.
Table 1: Users - Information about each person
This stores basic info about users including whether they are a celebrity (affects fan-out strategy).
| Column | What it stores | Example |
|---|---|---|
| id | Unique ID for this user | user_123 |
| username | Their handle | @johndoe |
Table 2: Posts - The actual content people create
This stores every post. We partition by user_id so each user's posts are stored together.
| Column | What it stores | Example |
|---|---|---|
| id | Unique ID for this post | post_789 |
| user_id | Who created it | user_123 |
Why partition by user_id?
When we need to fetch celebrity posts (pull model), we ask: get recent posts from user X. Partitioning by user_id means all of user X's posts are on the same database shard - one fast query instead of asking every shard.
Table 3: Follows - Who follows whom (the social graph)
This is a simple table with two columns. Each row means: follower follows followee.
| Column | What it stores | Example |
|---|---|---|
| follower_id | The person who clicked follow | user_456 |
| followee_id | The person being followed | user_123 |
| created_at | When they started following | 2024-02-10 |
Why we need two indexes: - Index on (follower_id): Quickly find everyone that user_456 follows. Used when building their feed. - Index on (followee_id): Quickly find everyone who follows user_123. Used for fan-out when they post.
Feed Cache in Redis - The pre-built feeds
This is not a SQL table. It is stored in Redis (fast in-memory storage) as a list.
Key: feed:{user_id}
Value: List of post IDs, newest first
Table 4: Celebrity Follows - Which celebrities each user follows
We keep a separate small table of just celebrity follows. This makes the pull step fast.
| Column | What it stores | Example |
|---|---|---|
| user_id | The regular user | user_456 |
| celebrity_id | The celebrity they follow | user_taylor_swift |
| created_at | When they followed | 2024-01-05 |
Why separate celebrity follows?
When loading a feed, we need to pull celebrity posts. Instead of scanning all 200 follows to find which ones are celebrities, we just look at this small table. Most users follow only a few celebrities, so this table is tiny and fast to query.
How Posting Works (Write Path)
Let me explain step by step what happens when someone creates a post. This is called the write path.
What to tell the interviewer
When a user posts, we do the minimum work synchronously (save the post, upload media) and return success. The expensive fan-out work happens asynchronously in background workers. This keeps posting fast.
What happens when you create a post
FUNCTION create_post(user_id, content, media_files):
STEP 1: Upload media files to CDN (if any)FUNCTION fan_out_worker():
// This runs continuously, processing jobs from the queue
What about failures?
If a fan-out worker crashes halfway through, we have not updated all followers. Solution: Use a reliable queue (like Kafka) that tracks which jobs are done. If a worker crashes, another worker picks up the unfinished job and continues.
Handling large fan-outs efficiently:
Even for non-celebrities, fan-out to 10,000 followers is 10,000 Redis writes. We can batch these:
- 1.Group followers by which Redis server they are on 2. Send batch write to each Redis server (1 network call for 1000 writes instead of 1000 calls) 3. Use Redis pipeline (send many commands, wait for all responses at once)
How Reading Feed Works (Read Path)
Now let me explain what happens when someone opens the app to see their feed. This is the read path and it must be FAST.
What to tell the interviewer
Reading the feed has two parts: (1) Get the pre-built feed from Redis cache - this has posts from regular users, (2) Pull recent posts from celebrities the user follows. Then we merge and rank everything.
What happens when you open your feed
FUNCTION get_feed(user_id, page_number):
// page_number 0 = first 50 posts, page 1 = next 50, etc.
Making it fast with caching:
The feed must load in under 500 milliseconds. Here is how we achieve that:
- 1.Redis is in memory - Getting the pre-built feed takes 1-2 milliseconds 2. Celebrity posts are cached - We cache recent celebrity posts in Redis too 3. Post details are cached - Full post data is cached so we rarely hit the database 4. Parallel fetches - We fetch pre-built feed and celebrity posts at the same time 5. Pre-compute ranking scores - Some ranking signals are pre-calculated, not computed on every request
FUNCTION get_post_details(post_ids):
// Try to get from cache first
Cache hit rate matters a lot
If 95% of requests hit cache, we only need to handle 5% from the database. With 50,000 feed requests per second, that is only 2,500 database requests per second - very manageable. If cache hit rate drops to 80%, database load jumps to 10,000 per second - could be a problem.
Feed Ranking
Users want to see interesting posts, not just the newest ones. Let me explain how we rank posts.
What to tell the interviewer
For a basic system, we can rank by a simple formula combining recency, engagement, and relationship. For advanced ranking, companies use machine learning to predict which posts each user will engage with.
Simple ranking formula (good for interviews):
Each post gets a score. Higher score = shown earlier in feed.
Score = Base Score + Engagement Boost + Relationship Boost + Recency Boost
- Base Score: All posts start at 100 points - Engagement Boost: Likes x 0.5 + Comments x 2 + Shares x 3 (comments and shares are more valuable) - Relationship Boost: +50 if poster is a close friend (you interact with them often) - Recency Boost: Posts lose 10 points per hour (older posts rank lower)
FUNCTION calculate_score(post, viewer_user_id):
// Base scoreHow real companies rank (for bonus points):
Facebook, Instagram, and TikTok use machine learning:
- 1.Collect signals: Thousands of features - who posted, when, what type of content, viewer's past behavior 2. Train a model: Predict probability that viewer will like/comment/share this post 3. Score each post: Model outputs a score for each post for each viewer 4. A/B test: Try different ranking approaches and measure which keeps users engaged longer
You do not need to explain ML in the interview, but knowing it exists shows depth.
Ranking at scale is expensive
Calculating scores for 500 posts for 50,000 users per second = 25 million calculations per second. Solutions: (1) Pre-calculate and cache scores, update every few minutes. (2) Only rank the top 200 posts, not all 500. (3) Use approximate scores for first page, more accurate scores for later pages.
Handling Edge Cases
Tell the interviewer about edge cases
Good engineers think about what can go wrong and unusual situations. Let me walk through the tricky cases and how we handle them.
Edge Case 1: User follows/unfollows someone
When you follow someone: - Add them to your celebrity_follows table (if celebrity) or follows table - Their future posts will appear in your feed - Do we show their past posts too? Usually yes - backfill the last 10-20 posts into your feed
When you unfollow someone: - Remove from follows table - Their posts stay in your feed cache until they naturally scroll off - Or we can proactively remove them (more complex)
FUNCTION follow_user(follower_id, followee_id):
STEP 1: Add to follows tableEdge Case 2: User becomes a celebrity (crosses 10K followers)
This is rare but needs handling: 1. Mark user as is_celebrity = true 2. Stop fanning out their new posts 3. Their old posts are already in follower feeds - that is fine 4. Future posts will be pulled instead of pushed
Edge Case 3: Post is deleted
When someone deletes their post: 1. Mark post as deleted in database (soft delete - keep the record) 2. Do NOT try to remove from all follower feeds (too expensive) 3. When rendering feed, skip posts marked as deleted 4. Eventually the post ID scrolls out of feeds naturally
Edge Case 4: Inactive users
Users who have not logged in for months: - Their Redis feed cache wastes memory - Solution: Set expiry on Redis keys (expire after 30 days of inactivity) - When they return, rebuild their feed from scratch (takes a few seconds, but saves memory for millions of inactive users)
Edge Case 5: Empty feed (new users)
New users follow nobody, so their feed is empty. Solutions: 1. Suggest popular accounts to follow 2. Show trending posts from public accounts 3. Show posts from suggested friends (based on phone contacts or mutual connections)
Interview tip
You do not need to solve all edge cases in detail. Mentioning them shows you think about real-world complexity. Say something like: There are edge cases like user becoming celebrity or deleting posts - I would handle deleted posts by soft-deleting and filtering at read time.
What Can Go Wrong and How We Handle It
Tell the interviewer about failures
Good engineers think about what can break. Let me walk through failures and our defenses.
| What breaks | What happens to users | How we fix it |
|---|---|---|
| Redis cache goes down | Feed loading becomes very slow | Use Redis cluster with replicas. If one node dies, others take over. Also have fallback: pull posts directly from database (slower but works) |
| Fan-out workers are overloaded | New posts take longer to appear in feeds | Add more workers. Use autoscaling - spin up more workers when queue grows. Celebrity posts are not affected since they use pull model. |
Preventing thundering herd (a common problem):
Imagine a celebrity with 100 million followers has their feed cache expire. Now 100 million users request their posts at the same time - the database is crushed.
Solution: Cache stampede prevention 1. Add jitter to cache expiry (random offset so not all expire at once) 2. Lock pattern: When cache misses, one request rebuilds cache while others wait 3. Warm cache proactively: Background job refreshes popular caches before they expire
FUNCTION get_celebrity_posts_with_lock(celebrity_id):
cache_key = "celeb_posts:" + celebrity_id
lock_key = "lock:" + cache_keyMonitoring is crucial
Set up alerts for: (1) Cache hit rate drops below 90%, (2) Feed load time exceeds 500ms, (3) Fan-out queue grows beyond 1 million, (4) Database CPU exceeds 70%. Catching problems early prevents outages.
Growing the System Over Time
What to tell the interviewer
This design works for a few hundred million users. Let me explain how we would scale further and add more features as the product grows.
How we grow step by step:
Stage 1: Starting out (up to 100 million users) - Single region deployment - One Redis cluster for feeds - One database cluster with read replicas - This handles most startups and mid-size companies
Stage 2: Scaling (100-500 million users) - Shard Redis by user_id (split feeds across multiple clusters) - Shard database by user_id - Add more fan-out workers - Consider multi-region for latency
Stage 3: Global scale (1 billion+ users) - Multiple data centers worldwide - Eventually consistent feeds across regions - Sophisticated caching at edge locations - Machine learning for personalized ranking
Multi-region deployment
Cool features we can add later:
1. Stories/Ephemeral content Content that disappears after 24 hours. Similar to feed but with TTL (time to live) on all data.
2. Notifications Tell users when their post gets lots of engagement, or when a close friend posts.
3. Search Search posts by keyword. Needs a separate search index (Elasticsearch).
4. Trending topics Track what is being talked about right now. Aggregate and count hashtags in real-time.
5. Ads in feed Insert sponsored posts between organic posts. Needs a separate ad-serving system.
// For users who want instant updates, use WebSocket
FUNCTION subscribe_to_feed_updates(user_id, websocket_connection):Interview tip: Do not over-design
Start simple. Explain the basic push-pull hybrid first. Only add complexity (multi-region, ML ranking, real-time) if the interviewer asks about scaling further or if you have time. A clear simple design beats a confusing complex one.