Design Twitter: System Design Interview Complete Guide
How to design Twitter's timeline, tweet posting, and follow system. Covers fan-out strategies, real-time delivery, and scaling to billions of users.
Ready to Master System Design Interviews?
Learn from 25+ real interview problems from Netflix, Uber, Google, and Stripe. Created by a senior engineer who's taken 200+ system design interviews at FAANG companies.
Complete Solutions
Architecture diagrams & trade-off analysis
Real Interview Problems
From actual FAANG interviews
7-day money-back guarantee • Lifetime access • New problems added quarterly
"Design Twitter" is one of the most asked system design interview questions. It appears at Google, Meta, Amazon, and virtually every tech company because it tests fundamental distributed systems concepts:
- Fan-out problem , How do you deliver a tweet to millions of followers?
- Read vs. write optimization , Timeline reads vastly outnumber tweet writes
- Real-time delivery , Users expect tweets to appear instantly
- Scaling social graphs , Handling follows between billions of users
This guide walks through a complete answer, from requirements clarification to deep dives on the trickiest components.
Step 1: Clarify Requirements
Never jump into design. Start by scoping the problem.
Functional Requirements
Ask: "What features should we support?"
Core features (must have):
- Post a tweet (280 characters, optionally with media)
- Follow/unfollow users
- View home timeline (tweets from people you follow)
- View user profile/timeline
Extended features (nice to have, ask if in scope):
- Likes and retweets
- Direct messages
- Hashtags and search
- Trending topics
- Notifications
For this design, focus on: Posting tweets, following, and home timeline. These cover the core algorithmic challenges.
Non-Functional Requirements
Ask: "What scale are we designing for?"
Typical answers:
- 500 million total users
- 200 million daily active users (DAU)
- 500 million tweets per day
- Average user follows 200 people
- Average user has 200 followers
- Celebrity accounts: some users have 50+ million followers
- Timeline reads: 10 billion per day
Derived metrics:
- Read:Write ratio: 20:1 (timelines read 20x more than tweets posted)
- Tweets per second: 500M / 86,400 ≈ 5,800 tweets/sec
- Timeline reads per second: 10B / 86,400 ≈ 116,000 reads/sec
Latency requirements:
- Tweet posting: < 500ms
- Timeline load: < 200ms
- Follow action: < 200ms
Step 2: High-Level Design
Let's sketch the major components:
┌─────────────────────────────────────────────────────────────────┐
│ Clients │
│ (Mobile Apps, Web, Third-party) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ API Gateway │
│ (Authentication, Rate Limiting, Routing) │
└─────────────────────────────────────────────────────────────────┘
│
┌─────────────────┼─────────────────┐
▼ ▼ ▼
┌────────────┐ ┌────────────┐ ┌────────────┐
│ Tweet │ │ Timeline │ │ User │
│ Service │ │ Service │ │ Service │
└────────────┘ └────────────┘ └────────────┘
│ │ │
▼ ▼ ▼
┌────────────┐ ┌────────────┐ ┌────────────┐
│ Tweet Store│ │ Timeline │ │ User/Graph │
│(Cassandra) │ │Cache(Redis)│ │ Store │
└────────────┘ └────────────┘ └────────────┘
│
▼
┌────────────┐
│ Fan-out │
│ Service │
└────────────┘
Core Components
1. Tweet Service
- Handles tweet creation, storage, and retrieval
- Stores tweets in a distributed database
- Triggers fan-out to followers' timelines
2. Timeline Service
- Serves home timeline requests
- Reads from precomputed timeline cache
- Handles timeline generation for pull-based timelines
3. User Service
- Manages user profiles
- Handles follow/unfollow operations
- Maintains the social graph
4. Fan-out Service
- Distributes tweets to followers' timelines
- Key component that determines system architecture
Step 3: The Fan-Out Problem (Core Challenge)
This is the heart of the Twitter design. When User A posts a tweet, how does it appear in all their followers' timelines?
Three Approaches
1. Fan-out on Write (Push Model)
When a user posts a tweet:
- Write tweet to Tweet Store
- Look up all followers
- Write tweet ID to each follower's timeline cache
User A posts tweet
│
▼
┌─────────────────┐
│ Tweet Service │
│ (stores tweet) │
└────────┬────────┘
│
▼
┌─────────────────┐ ┌─────────────────────────────────┐
│ Fan-out Service │────▶│ Write to 1000 followers' │
│ │ │ timeline caches │
└─────────────────┘ └─────────────────────────────────┘
Pros:
- Timeline reads are instant (just read from cache)
- Scales well for reads
Cons:
- Celebrity problem: 50M followers = 50M writes per tweet
- Wasted work for inactive followers
- Slow tweet posting for celebrities
2. Fan-out on Read (Pull Model)
When a user loads their timeline:
- Look up all users they follow
- Fetch recent tweets from each
- Merge and sort
User B loads timeline
│
▼
┌─────────────────┐
│Timeline Service │
│ │
└────────┬────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ 1. Get 200 users that B follows │
│ 2. Fetch recent tweets from each (200 queries) │
│ 3. Merge, sort by time │
│ 4. Return top 100 │
└─────────────────────────────────────────────────────┘
Pros:
- No write amplification
- Tweet posting is fast for everyone
Cons:
- Timeline reads are slow (hundreds of queries)
- Doesn't scale for active users
3. Hybrid Approach (What Twitter Actually Does)
Combine both approaches based on follower count:
IF poster has < 10,000 followers:
Fan-out on write (push to all followers' caches)
ELSE:
Don't fan-out (celebrity tweets pulled at read time)
ON timeline read:
1. Read precomputed timeline from cache
2. Fetch recent tweets from followed celebrities
3. Merge and return
Why this works:
- 99% of users have < 10,000 followers → fast push
- 1% celebrities → avoid 50M writes per tweet
- Timeline read adds a few extra queries for celebrity tweets (acceptable)
Hybrid Architecture
┌─────────────────────────────────────────────────────────────────┐
│ Tweet Posted │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────┐
│ Follower count │
│ < 10,000? │
└────────┬────────┘
│
┌──────────────┴──────────────┐
│ YES │ NO
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Fan-out to all │ │ Store in Tweet │
│ followers' │ │ Store only │
│ timeline caches │ │ (marked as │
└─────────────────┘ │ celebrity) │
└─────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ Timeline Request │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ 1. Read precomputed timeline from cache (pushed tweets) │
│ 2. Get list of followed celebrities │
│ 3. Fetch recent tweets from each celebrity (pull) │
│ 4. Merge pushed + pulled tweets by timestamp │
│ 5. Return top N tweets │
└─────────────────────────────────────────────────────────────────┘
Step 4: Data Models
Tweet Schema
-- Tweet Store (Cassandra or DynamoDB)
CREATE TABLE tweets (
tweet_id BIGINT PRIMARY KEY, -- Snowflake ID (time-sortable)
user_id BIGINT,
content TEXT,
media_urls LIST<TEXT>,
created_at TIMESTAMP,
like_count INT,
retweet_count INT,
reply_count INT
);
-- Secondary index for user timeline
CREATE TABLE user_tweets (
user_id BIGINT,
tweet_id BIGINT,
created_at TIMESTAMP,
PRIMARY KEY (user_id, tweet_id)
) WITH CLUSTERING ORDER BY (tweet_id DESC);
Why Cassandra?
- High write throughput (500M tweets/day)
- Horizontal scaling
- Time-series friendly (tweets sorted by time)
Timeline Cache
-- Redis structure for home timeline
Key: timeline:{user_id}
Value: Sorted Set of tweet_ids, scored by timestamp
Example:
timeline:12345 = {
tweet_98765: 1704412800000, // Jan 5, 2026 12:00:00
tweet_98764: 1704412799000, // Jan 5, 2026 11:59:59
...
}
Why Redis?
- Sub-millisecond reads
- Sorted sets perfect for timeline ordering
- Can trim to last N tweets automatically
Social Graph
-- Follow relationships
CREATE TABLE follows (
follower_id BIGINT,
followee_id BIGINT,
created_at TIMESTAMP,
PRIMARY KEY (follower_id, followee_id)
);
-- Reverse index for "who follows me"
CREATE TABLE followers (
followee_id BIGINT,
follower_id BIGINT,
created_at TIMESTAMP,
PRIMARY KEY (followee_id, follower_id)
);
Why two tables?
follows: "Who do I follow?" , for timeline generationfollowers: "Who follows me?" , for fan-out
Step 5: API Design
Post Tweet
POST /api/v1/tweets
Authorization: Bearer {token}
Request:
{
"content": "Hello, world!",
"media_ids": ["abc123"], // optional, pre-uploaded
"reply_to": "tweet_98765" // optional
}
Response (201 Created):
{
"tweet_id": "tweet_98766",
"content": "Hello, world!",
"created_at": "2026-01-05T12:00:00Z",
"user": {
"id": "12345",
"username": "johndoe"
}
}
Get Home Timeline
GET /api/v1/timeline/home?cursor={cursor}&limit=20
Authorization: Bearer {token}
Response:
{
"tweets": [
{
"tweet_id": "tweet_98766",
"content": "Hello, world!",
"created_at": "2026-01-05T12:00:00Z",
"user": {...},
"like_count": 42,
"retweet_count": 5
},
...
],
"next_cursor": "tweet_98746"
}
Follow User
POST /api/v1/users/{user_id}/follow
Authorization: Bearer {token}
Response (200 OK):
{
"following": true
}
Step 6: Tweet Posting Flow (Deep Dive)
Let's trace what happens when a user posts a tweet:
Client posts tweet
│
▼
┌───────────────────┐
│ API Gateway │
│ (auth, rate limit)│
└─────────┬─────────┘
│
▼
┌───────────────────┐
│ Tweet Service │
│ │
│ 1. Validate content│
│ 2. Generate ID │
│ 3. Store tweet │
│ 4. Send to Kafka │
└─────────┬─────────┘
│
▼
┌───────────────────┐
│ Kafka │
│ (tweet_posted │
│ topic) │
└─────────┬─────────┘
│
▼
┌───────────────────┐
│ Fan-out Worker │
│ │
│ 1. Get follower │
│ list │
│ 2. If < 10K, │
│ push to caches │
│ 3. Else, mark │
│ as celebrity │
└───────────────────┘
ID Generation: Snowflake IDs
Twitter invented Snowflake IDs for unique, time-sortable IDs:
┌─────────────────────────────────────────────────────────────┐
│ Snowflake ID (64 bits) │
├──────────────┬────────────┬────────────────┬────────────────┤
│ 1 bit │ 41 bits │ 10 bits │ 12 bits │
│ (unused) │(timestamp) │ (machine ID) │ (sequence) │
└──────────────┴────────────┴────────────────┴────────────────┘
Why Snowflake?
- Unique without coordination (machine ID + sequence)
- Time-sortable (can order by ID instead of timestamp)
- 64-bit fits in a long integer
Fan-out Worker Detail
def fan_out_tweet(tweet_id, user_id):
# Get follower list
followers = get_followers(user_id)
follower_count = len(followers)
if follower_count > CELEBRITY_THRESHOLD: # e.g., 10,000
# Mark as celebrity tweet, don't fan out
mark_celebrity_tweet(tweet_id, user_id)
return
# Fan out to all followers
for batch in batched(followers, 1000):
# Parallel writes to Redis
pipeline = redis.pipeline()
for follower_id in batch:
pipeline.zadd(
f"timeline:{follower_id}",
{tweet_id: timestamp}
)
pipeline.execute()
# Trim timeline to last 800 tweets
for follower_id in batch:
redis.zremrangebyrank(
f"timeline:{follower_id}",
0, -801 # Keep last 800
)
Step 7: Timeline Read Flow (Deep Dive)
Client requests timeline
│
▼
┌───────────────────┐
│ API Gateway │
└─────────┬─────────┘
│
▼
┌───────────────────┐
│ Timeline Service │
│ │
│ 1. Read from cache│
│ 2. Get celebrity │
│ tweets │
│ 3. Merge & sort │
│ 4. Hydrate tweets │
│ 5. Return │
└─────────┬─────────┘
│
▼
┌───────────────────────────────────────────────────────────────┐
│ Response │
└───────────────────────────────────────────────────────────────┘
Implementation
def get_home_timeline(user_id, cursor=None, limit=20):
# Step 1: Get precomputed timeline from cache
cached_tweet_ids = redis.zrevrangebyscore(
f"timeline:{user_id}",
max=cursor or "+inf",
min="-inf",
start=0,
num=limit
)
# Step 2: Get celebrity tweets
followed_celebrities = get_followed_celebrities(user_id)
celebrity_tweets = []
for celebrity_id in followed_celebrities:
recent_tweets = get_recent_tweets(celebrity_id, limit=5)
celebrity_tweets.extend(recent_tweets)
# Step 3: Merge and sort
all_tweet_ids = cached_tweet_ids + [t.id for t in celebrity_tweets]
all_tweet_ids.sort(key=lambda x: get_timestamp(x), reverse=True)
all_tweet_ids = all_tweet_ids[:limit]
# Step 4: Hydrate (fetch full tweet data)
tweets = batch_get_tweets(all_tweet_ids)
# Step 5: Return with cursor for pagination
next_cursor = tweets[-1].id if len(tweets) == limit else None
return {"tweets": tweets, "next_cursor": next_cursor}
Caching Strategy
┌─────────────────────────────────────────────────────────────────┐
│ Cache Layers │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Layer 1: CDN (static assets, not timelines) │
│ │
│ Layer 2: Timeline Cache (Redis) │
│ - Precomputed timeline (pushed tweet IDs) │
│ - TTL: Forever (updated on tweet post) │
│ - Eviction: Keep last 800 tweets per user │
│ │
│ Layer 3: Tweet Cache (Redis) │
│ - Full tweet objects │
│ - TTL: 24 hours │
│ - Eviction: LRU │
│ │
│ Layer 4: User Cache (Redis) │
│ - User profile data │
│ - TTL: 1 hour │
│ - Eviction: LRU │
│ │
└─────────────────────────────────────────────────────────────────┘
Step 8: Scaling Considerations
Database Sharding
Tweet Store:
- Shard by tweet_id (even distribution)
- Alternatively, shard by user_id (keeps user's tweets together)
User Timeline Table:
- Shard by user_id (efficient for "get my tweets")
Follows Table:
- Shard by follower_id (efficient for "who do I follow")
Timeline Cache Scaling
Problem: Hot users (celebrities) have many followers reading their tweets.
Solutions:
- Cache replication: Multiple Redis replicas, route reads randomly
- Local caching: API servers cache hot tweets in memory (30-second TTL)
Fan-out Worker Scaling
Problem: Celebrity with 50M followers decides to post.
Even with hybrid approach, some accounts between 10K-1M still cause significant fan-out.
Solutions:
- Async processing: Don't block tweet post on fan-out completion
- Batched writes: Write to Redis in batches of 1000
- Rate limiting on writes: Spread fan-out over time
- Priority queues: Process high-engagement users first
Step 9: Additional Features (If Asked)
Search
┌─────────────────────────────────────────────────────────────────┐
│ Search Architecture │
└─────────────────────────────────────────────────────────────────┘
Tweet posted
│
▼
┌────────────────────┐
│ Elasticsearch │◀── Index tweet: content, hashtags, user
│ Cluster │
└────────────────────┘
│
▼
┌────────────────────┐
│ Search Service │◀── Query: "system design"
│ │ → Returns ranked tweet IDs
└────────────────────┘
Trending Topics
def compute_trending():
# Stream processing with Kafka + Flink
# 1. Extract hashtags from tweets
# 2. Count per time window (5 minutes)
# 3. Compare to baseline (normal volume)
# 4. Rank by velocity (growth rate), not just count
# 5. Filter bots and spam
# 6. Geographic segmentation
return trending_topics
Notifications
┌─────────────────────────────────────────────────────────────────┐
│ Notification Types │
├─────────────────────────────────────────────────────────────────┤
│ 1. Someone followed you │
│ 2. Someone liked your tweet │
│ 3. Someone replied to your tweet │
│ 4. Someone mentioned you │
└─────────────────────────────────────────────────────────────────┘
Event occurs
│
▼
┌────────────────────┐
│ Notification │
│ Service │
│ │
│ 1. Check user prefs│
│ 2. Rate limit │
│ 3. Write to inbox │
│ 4. Push via FCM │
└────────────────────┘
Step 10: Common Follow-Up Questions
"What happens when a user with 50M followers posts?"
With hybrid approach:
- Tweet stored in Tweet Store
- Marked as celebrity tweet (no fan-out)
- On timeline read, celebrity tweets are pulled and merged
- Celebrity tweet cache can be replicated for read scaling
"How do you handle the case where a user unfollows someone?"
Two options:
- Lazy removal: Tweet stays in cache, filter on read (simpler)
- Active removal: Send unfollow event to remove tweets from cache (more accurate)
Recommendation: Lazy removal. The cached tweet will eventually be pushed out by newer tweets. Cost of active removal doesn't justify perfect accuracy.
"How do you ensure timeline consistency?"
Challenge: User posts tweet, immediately checks their own timeline, doesn't see it.
Solutions:
- Read-your-writes consistency: After posting, read from leader/cache-write, not replica
- Include recent self-tweets: Always merge last N self-tweets with timeline
- Client-side optimistic update: Show tweet immediately, sync in background
"How do you handle tweet deletion?"
- Mark tweet as deleted in database (soft delete)
- Remove from timeline caches of author's followers
- For celebrity tweets, removal happens on next cache refresh
- Keep deleted tweets for compliance/audit, just don't display
"What about ranked timelines vs. chronological?"
Ranked timeline architecture:
- Fetch candidate tweets (same as chronological)
- Score each tweet with ML model (engagement prediction)
- Factors: author affinity, content relevance, recency, engagement
- Return top N by score
Trade-off: Latency vs. ranking quality. Can pre-compute scores or do real-time.
Summary: The Complete Answer
In a 45-minute interview, hit these points:
| Time | What to Cover |
|---|---|
| 0-5 min | Requirements: features, scale, latency |
| 5-10 min | High-level design: services, data stores |
| 10-25 min | Fan-out problem: push vs. pull vs. hybrid |
| 25-35 min | Deep dives: posting flow, timeline read, IDs |
| 35-45 min | Scaling, follow-ups, trade-offs |
Key differentiators:
- Understand the fan-out problem deeply
- Know the hybrid approach and why it works
- Can explain Snowflake IDs
- Discuss trade-offs at each decision
- Handle celebrity edge cases
Ready to Master System Design Interviews?
Learn from 25+ real interview problems from Netflix, Uber, Google, and Stripe. Created by a senior engineer who's taken 200+ system design interviews at FAANG companies.
Complete Solutions
Architecture diagrams & trade-off analysis
Real Interview Problems
From actual FAANG interviews
7-day money-back guarantee • Lifetime access • New problems added quarterly
FREE: System Design Interview Cheat Sheet
Get the 7-page PDF cheat sheet with critical numbers, decision frameworks, and the interview approach used by 10,000+ engineers.
No spam. Unsubscribe anytime.
Related Articles
Why Distributed Systems Fail: 15 Failure Scenarios Every Engineer Must Know
A comprehensive guide to the most common failure modes in distributed systems, from network partitions to split-brain scenarios, with practical fixes for each.
Read moreThe 7 System Design Problems You Must Know Before Your Interview
These 7 system design questions appear in 80% of interviews at Google, Meta, Amazon, and Netflix. Master them, and you can handle any variation.
Read moreAmazon System Design Interview: Leadership Principles Meet Distributed Systems
How Amazon's system design interviews differ from other FAANG companies. Real questions, LP integration, and what bar raisers actually look for.
Read more