The Universal System Design Interview Framework: 8 Steps to Design ANY System
Master the complete 8-step framework used by Principal Architects to ace system design interviews at Google, Amazon, Meta, and Netflix. Learn to design Twitter, Uber, or any system you've never seen before.
Ready to Master System Design Interviews?
Learn from 25+ real interview problems from Netflix, Uber, Google, and Stripe. Created by a senior engineer who's taken 200+ system design interviews at FAANG companies.
Complete Solutions
Architecture diagrams & trade-off analysis
Real Interview Problems
From actual FAANG interviews
7-day money-back guarantee • Lifetime access • New problems added quarterly
What if I told you that Twitter, Uber, Netflix, and every other system you'll ever be asked to design in an interview are fundamentally the same?
Every system does exactly three things:
- Take data in , Users send requests (post a tweet, upload a photo, request a ride)
- Store data , Save it somewhere (database, cache, file storage)
- Give data back , Users ask for data (show my feed, display the photo, find me a driver)
That's it.
The difference between designing a simple app and designing Twitter isn't the pattern, it's how much data, how fast, and what happens when things break.
This 8-step framework works for ANY system design question. Whether you're asked to design a URL shortener or Netflix, these steps apply. Master this framework, and you can design anything.
Why This Framework Works
System design interviews are terrifying because they feel unpredictable. "Design Uber" has a million possible directions. Where do you start?
The secret is that every interview follows the same structure:
- Understand what you're building
- Make key decisions about architecture
- Design the system
- Handle scale and reliability
This framework breaks those into 8 concrete steps with specific time allocations:
| Step | What You Do | Time |
|---|---|---|
| 1 | Clarify Requirements | 5 min |
| 2 | Design the API | 3 min |
| 3 | Quick Math (Estimation) | 3 min |
| 4 | Decide: Real-time vs Pre-compute | 3 min |
| 5 | Choose Database | 3 min |
| 6 | Draw Basic Architecture | 10 min |
| 7 | Handle Scale | 10 min |
| 8 | Add Reliability | 5 min |
| - | Wrap up and Questions | 3 min |
Let's break down each step.
Step 1: Clarify Requirements (5 minutes)
The number one reason people fail system design interviews: they solve the wrong problem.
The interviewer says "Design Twitter" and the candidate starts building everything, tweets, retweets, likes, DMs, notifications, trending topics, ads, analytics...
Stop. You have 45 minutes. You cannot build everything.
Think MVP at Scale
Ask yourself: If I had to launch this product tomorrow with millions of users, what are the 2-3 features I MUST have?
For Twitter MVP:
- Post a tweet
- See tweets from people you follow
- Follow/unfollow users
That's it. No likes, no retweets, no DMs. Those come in v2.
The Four Types of Questions to Ask
Question Type 1: Core Features
"What are the 2-3 things this system MUST do? What can we skip for now?"
Example: For Uber , request ride, match driver, track ride. Skip: scheduled rides, ride sharing, tipping.
Question Type 2: Users
"Is this for consumers (B2C) or businesses (B2B)? How many users? Global or regional?"
Example: For Slack , B2B, teams of 10-10,000 people, mostly in one timezone per team.
Question Type 3: Scale
"Startup scale (thousands of users) or big tech scale (hundreds of millions)?"
This changes everything. A startup chat app vs WhatsApp need completely different designs.
Question Type 4: Special Requirements
"Does it need to be real-time? Super reliable? Work offline? Handle spikes?"
Example: For a payment system , must never lose data, must handle exactly once.
Example: Clarifying Twitter Requirements
ME: Before I start, let me understand what we're building.
CORE FEATURES:
Me: What are the must-have features?
Interviewer: Posting tweets and seeing your feed.
Me: Should I include likes, retweets, DMs?
Interviewer: Focus on posting and feed for now.
USERS:
Me: How many users are we designing for?
Interviewer: Think Twitter scale, hundreds of millions.
Me: Global users or concentrated in one region?
Interviewer: Global.
SCALE:
Me: Any specific numbers I should target?
Interviewer: Let's say 500 million users, 200 million daily active.
SPECIAL REQUIREMENTS:
Me: How fast should the feed load?
Interviewer: Under 200 milliseconds.
Me: Is it okay if a tweet takes a few seconds to appear in feeds?
Interviewer: Yes, a small delay is fine.
SUMMARY:
- Core: Post tweets + View feed
- Scale: 500M users, 200M daily active
- Latency: Feed loads in under 200ms
- Eventual consistency is okay (small delays acceptable)
Pro tip: Always summarize what you heard before moving to the next step. Say: "So to confirm, I'm designing X with features Y, for Z users, with these constraints." This shows you listen and prevents mistakes.
Step 2: Design the API (3 minutes)
Before drawing any boxes, ask: What API endpoints will users call?
This is powerful because:
- It forces you to understand what the system actually does
- It tells you what data you need to store
- It shows the interviewer you think about interfaces
Keep It Simple
For each core feature, write one API endpoint. Include:
- The HTTP method (GET, POST, PUT, DELETE)
- The URL path
- What goes in (input)
- What comes out (output)
Example: Twitter API
CORE APIS:
1. POST A TWEET
POST /tweets
Input: { user_id, content, media_ids (optional) }
Output: { tweet_id, created_at }
2. GET HOME FEED
GET /feed?user_id=123&limit=20&cursor=xyz
Input: user_id, how many tweets, where to start
Output: { tweets: [...], next_cursor }
3. FOLLOW A USER
POST /follow
Input: { follower_id, followee_id }
Output: { success: true }
4. UNFOLLOW A USER
DELETE /follow
Input: { follower_id, followee_id }
Output: { success: true }
WHAT THIS TELLS US:
- We need to store: tweets, users, who follows whom
- GET /feed is called way more than POST /tweets (read-heavy)
- Feed needs to be fast (called on every app open)
- The hard question: How do we build /feed efficiently?
What the API Tells You
| Question | What It Means |
|---|---|
| Which API is called most? | This is your hot path, optimize it |
| More reads or writes? | Read-heavy = caching helps. Write-heavy = need fast database |
| Need real-time updates? | Might need WebSockets or long polling |
| What data is returned? | This shapes your data model |
Key Insight: In social apps, the feed API looks simple but is the hardest to build. It needs to be fast, personalized, and handle users who follow thousands of accounts. Keep this in mind, we'll solve it later.
Step 3: Quick Math , Back of Envelope (3 minutes)
Numbers tell you if your design will work. The design for 1,000 users is totally different from 100 million users.
Don't spend long on this. 3 minutes of quick math is enough.
The Numbers That Matter
Calculate these four things:
- Requests per second (RPS) , How many API calls per second?
- Storage needed , How much disk space?
- Read vs Write ratio , Is it read-heavy or write-heavy?
- Peak load , How much higher is busy time vs normal?
Useful Numbers to Memorize
| Time Period | Seconds |
|---|---|
| 1 day | 86,400 (round to 100,000) |
| 1 month | 2.5 million |
| 1 year | 30 million |
| Requests per day | Requests per second |
|---|---|
| 1 million | ~10 RPS |
| 100 million | ~1,000 RPS |
| 1 billion | ~10,000 RPS |
Example: Quick Math for Twitter
GIVEN:
- 200 million daily active users
- Each user opens app 5 times per day
- Each user posts 0.5 tweets per day (on average)
- Each user follows 200 people
REQUESTS PER SECOND:
Feed loads (reads):
200M users × 5 opens = 1 billion feed loads per day
1 billion ÷ 100,000 seconds = 10,000 reads/second
Tweet posts (writes):
200M users × 0.5 tweets = 100 million tweets per day
100 million ÷ 100,000 seconds = 1,000 writes/second
Ratio: 10:1 read-heavy system
STORAGE:
Tweet size: 280 chars + metadata = ~500 bytes
Per day: 100M tweets × 500 bytes = 50 GB per day
Per year: 50 GB × 365 = 18 TB per year
PEAK:
Peak is usually 3-5x average
So plan for: 50,000 reads/second, 5,000 writes/second
WHAT THIS TELLS US:
✓ Read-heavy: caching will help a lot
✓ 10,000+ RPS: need multiple servers
✓ 18 TB/year: big but manageable, sharding may be needed later
Warning: Don't over-engineer based on numbers. The math shows what MIGHT matter. But don't build for 1 million RPS on day one. Start with a design that handles 10x current load. Know where you'll hit limits. That's what interviewers want to see.
Step 4: The Big Decision , Real-time vs Pre-compute (3 minutes)
This is the most important design decision in any system.
For every API endpoint, you have a choice:
Option A: Real-time (Compute on Read)
- When the user asks, calculate the answer right then
- Example: Search, run the query when user searches
Option B: Pre-compute (Compute on Write)
- Calculate the answer ahead of time, store it
- When user asks, just look it up
- Example: Feed, pre-build each user's feed, just fetch it
When to Use Which
| Use Real-time When | Use Pre-compute When |
|---|---|
| Results depend on fresh data | Same result requested many times |
| Too many possible queries to pre-compute | Result is expensive to calculate |
| Writes are rare, reads are rare too | Writes are rare, reads are very frequent |
| User expects small delay | User expects instant response |
| Data changes very frequently | Data changes less often than it's read |
The Twitter Feed Problem: Fan-out on Read vs Write
This is the classic example that comes up in almost every social media system design.
Option A: Real-time (Fan-out on Read)
When user opens feed:
- Look up everyone they follow (200 people)
- Get recent tweets from each person
- Merge and sort by time
- Return top 20 tweets
Pros:
- Always fresh data
- Simple to understand
- No extra storage
Cons:
- Slow! Must query 200 users every time
- Doesn't scale for users following 10,000 accounts
- Every feed open = heavy database load
Option B: Pre-compute (Fan-out on Write)
When someone posts a tweet:
- Find all their followers (could be millions)
- Add this tweet to each follower's feed (stored in cache)
When user opens feed:
- Just read their pre-built feed from cache
- Super fast, one read operation
Pros:
- Feed loads instantly
- Light database load on read
Cons:
- Celebrity problem: user with 50M followers = 50M writes
- Uses more storage (duplicate tweets in many feeds)
- Small delay before tweet appears in feeds
Option C: Hybrid (What Twitter Actually Does)
- For normal users (< 10K followers): pre-compute
- For celebrities (> 10K followers): real-time
- When building feed: get pre-computed tweets + fetch celebrity tweets
This is usually the right answer for social feeds.
Decision Factors
| Factor | Leans Toward Real-time | Leans Toward Pre-compute |
|---|---|---|
| Latency requirement | Can wait 100ms+ | Must be < 50ms |
| Read:Write ratio | Low (< 10:1) | High (> 100:1) |
| Data freshness | Must be real-time | Okay to be slightly stale |
| Query complexity | Simple queries | Complex aggregations |
| Storage cost | Expensive | Cheap |
Default: When unsure, start with real-time. It's simpler. Add pre-computation only when you need the speed. The interviewer wants to see you understand the tradeoff, not that you always pick the fancy option.
Step 5: Choose Your Database (3 minutes)
Simple Rule: Start with SQL.
Use PostgreSQL (SQL) unless you have a specific reason not to.
Why? SQL databases:
- Handle most use cases well
- Give you transactions (ACID)
- Let you change your queries later
- Are well understood and battle-tested
When to Use NoSQL
Only use NoSQL when SQL can't do the job:
| Problem | SQL Limitation | NoSQL Solution |
|---|---|---|
| Massive write throughput | Single leader bottleneck | Cassandra, DynamoDB |
| Flexible/changing schema | Schema changes are slow | MongoDB, DynamoDB |
| Simple key-value lookups | SQL overhead unnecessary | Redis, DynamoDB |
| Graph relationships | Joins get slow at depth | Neo4j |
| Full-text search | LIKE queries are slow | Elasticsearch |
| Time-series data | Not optimized for time queries | TimescaleDB, InfluxDB |
Database Selection Examples
Twitter:
- Users table: PostgreSQL (relational, need ACID)
- Tweets table: PostgreSQL (need to query by time, user)
- Follower graph: PostgreSQL (simple join) or Graph DB (if complex queries)
- Feed cache: Redis (fast reads, okay to lose on crash)
- Tweet search: Elasticsearch (full-text search)
Uber:
- Users, Drivers, Trips: PostgreSQL (transactional)
- Driver locations: Redis (ephemeral, changes every second)
- Location history: Cassandra (high write volume, time-series)
E-Commerce:
- Products, Orders, Users: PostgreSQL (ACID for payments)
- Shopping cart: Redis (ephemeral, fast)
- Product search: Elasticsearch (full-text, facets)
Chat Application:
- Users, Conversations: PostgreSQL
- Messages: Cassandra (high write volume, time-ordered)
- Online status: Redis (ephemeral, changes often)
- Message search: Elasticsearch
What to Tell the Interviewer
Say something like:
"For the main data (users, tweets, follows), I'll use PostgreSQL. It handles our query patterns well and gives us transactions.
For the feed cache, I'll use Redis. Feed reads need to be fast, and it's okay if we lose the cache on a crash, we can rebuild it.
If we need tweet search later, we'd add Elasticsearch."
This shows you make decisions based on needs, not hype.
Warning: Don't use multiple databases just to look smart. Every database you add is more complexity. In interviews, I see people add Kafka, Redis, Elasticsearch, Cassandra, and PostgreSQL to a simple app. Why? Start with one database. Add others only when you can explain exactly why.
Step 6: Draw the Basic Architecture (10 minutes)
Start Simple, Add Complexity Only When Needed.
Every system starts with the same basic building blocks:
- Load Balancer , Distributes traffic across servers
- App Servers , Run your code (stateless)
- Database , Stores your data
- Cache , Speeds up reads
That's it for the basic architecture. Let's build it step by step.
Layer 1: The Simplest System
Start here. This handles thousands of users.
[Users] → [Load Balancer] → [App Server 1] → [PostgreSQL]
↘ [App Server 2] ↗
What to explain:
- Load balancer spreads traffic evenly
- App servers are stateless (no data stored on them)
- Any server can handle any request
- Database is the single source of truth
Layer 2: Add Caching
When database reads become slow, add a cache.
[Users] → [LB] → [App Servers] → [Redis Cache] → [PostgreSQL]
What to explain:
- Check cache first, then database
- Cache stores hot data (frequently accessed)
- Cache miss: data not in cache, get from DB, store in cache
- Cache hit: return data directly, very fast
Layer 3: Add Background Workers
When some work is slow or can be done later, use async processing.
[Users] → [LB] → [App Servers] → [Message Queue] → [Workers] → [DB]
↘ [Redis] ↗
What to explain:
- Message queue holds work to be done later
- Workers process jobs from the queue
- Use for: sending emails, updating feeds, processing images
- User gets fast response, work happens in background
Full Twitter Architecture Example
COMPONENTS:
1. API GATEWAY
- Handles authentication (is this user logged in?)
- Rate limiting (is this user sending too many requests?)
- Routes requests to the right service
2. TWEET SERVICE
- POST /tweets endpoint
- Writes tweet to database
- Sends message to queue: "new tweet posted"
3. FEED SERVICE
- GET /feed endpoint
- Reads pre-built feed from Redis cache
- If cache miss, builds feed from database
4. FANOUT SERVICE (Background Worker)
- Listens for "new tweet posted" messages
- For each follower, adds tweet to their feed in cache
5. USER SERVICE
- Follow/unfollow endpoints
- Updates the social graph
6. DATA STORES:
- PostgreSQL: Users, Tweets, Followers (source of truth)
- Redis: Feed cache (user_id -> list of tweet_ids)
- S3: Images and videos (blob storage)
- CDN: Serves images and videos fast, globally
Warning: Keep it simple. Don't draw 15 boxes to look impressive. It backfires. It shows you can't simplify. Aim for 5-8 boxes maximum. Add more only if you can explain why each one is needed.
Step 7: Handle Scale (10 minutes)
Scale is about finding bottlenecks.
Adding more servers doesn't automatically mean better scale. You need to find what breaks first and fix that.
Typical bottleneck order:
- Database reads , Too many queries hitting the database
- Database writes , Too many writes for single database
- Hot partitions , One piece of data gets too much traffic
- Network , Too much data moving around
Solution 1: Caching (For Read Bottlenecks)
Caching is your first tool for read-heavy systems.
| Cache Strategy | How It Works | When to Use |
|---|---|---|
| Cache-Aside | Check cache, if miss read from DB, write to cache | Most common, works for most cases |
| Write-Through | Write to cache and DB at same time | When you need cache always in sync |
| Write-Behind | Write to cache, later write to DB | When DB writes are slow, okay to lose data |
Cache-Aside Pattern (Most Common):
def get_user(user_id):
# Step 1: Check cache first
user = cache.get(f"user:{user_id}")
if user is not None:
return user # Cache hit! Fast path
# Step 2: Cache miss - get from database
user = database.query("SELECT * FROM users WHERE id = ?", user_id)
# Step 3: Store in cache for next time (with expiry)
cache.set(f"user:{user_id}", user, expire=3600) # 1 hour
return user
Solution 2: Database Read Replicas
When cache isn't enough, add read replicas.
[App Servers] --writes--> [Primary DB]
--reads--> [Replica 1]
--reads--> [Replica 2]
[Primary DB] --replication--> [Replicas]
What to explain:
- One primary database handles all writes
- Multiple replicas handle reads
- Replicas copy data from primary (slight delay)
- Good for read-heavy systems (which most are)
Solution 3: Sharding (For Write Bottlenecks)
When one database can't handle all writes, split the data.
[App Servers] --> [Shard Router]
|
+-----------+-----------+
| | |
[Shard 0] [Shard 1] [Shard 2]
(user_id%3=0) (user_id%3=1) (user_id%3=2)
Sharding Strategies:
STRATEGY 1: HASH-BASED SHARDING
- shard = hash(user_id) % number_of_shards
- Pros: Even distribution
- Cons: Hard to add shards later, cross-shard queries hard
STRATEGY 2: RANGE-BASED SHARDING
- Shard 1: user_id 1-1,000,000
- Shard 2: user_id 1,000,001-2,000,000
- Pros: Easy range queries, easy to add shards
- Cons: Can get uneven (what if shard 1 has all active users?)
STRATEGY 3: DIRECTORY-BASED SHARDING
- Keep a lookup table: user_id -> shard
- Pros: Complete control, can move users between shards
- Cons: Lookup table is single point of failure
WHICH TO CHOOSE:
- Start with hash-based (simplest)
- Use consistent hashing to make adding shards easier
- Only use range if you need range queries
Solution 4: Handling Hot Partitions (The Celebrity Problem)
Some data gets WAY more traffic than others. This is called a hot partition.
Examples:
- Tweet from celebrity with 50 million followers
- Product on sale that everyone wants
- Viral video that everyone watches
| Problem | Solution | How It Works |
|---|---|---|
| Celebrity posts tweet | Separate celebrity handling | Don't fan-out for celebrities, pull their tweets on read |
| Hot product | Cache replication | Replicate hot item across multiple cache servers |
| Viral content | CDN + edge caching | Push content to CDN edges worldwide |
| Hot database row | Read replicas + cache | Cache the hot row, spread reads across replicas |
| Write hot spot | Append-only + batch | Write to log first, batch update later |
Handling the Celebrity Problem in Twitter:
PROBLEM:
Celebrity with 50 million followers posts a tweet.
If we fan-out on write, that's 50 million cache updates.
This will take forever and overwhelm our workers.
SOLUTION: HYBRID APPROACH
1. Classify users:
- Normal user: < 10,000 followers
- Celebrity: >= 10,000 followers
2. For normal users (fan-out on write):
- When they post, add tweet to all followers' feeds
- Works fine for 10,000 updates
3. For celebrities (fan-out on read):
- When they post, store tweet but do NOT fan out
- Mark their account as "celebrity"
4. When user loads feed:
- Get pre-built feed from cache (tweets from normal users)
- Find celebrities this user follows
- Fetch recent tweets from those celebrities
- Merge and sort
- Return to user
5. Optimization:
- Cache celebrity tweets aggressively (everyone reads them)
- Only fetch celebrities when user scrolls past pre-built feed
The 80/20 Rule of Scale: 80% of traffic usually goes to 20% of data. Find that 20% and cache it aggressively. Don't try to optimize everything, optimize what matters most.
Step 8: Add Reliability (5 minutes)
Everything fails.
Servers crash. Networks break. Databases go down. Disks fail.
Your system must keep working when things break.
The Key Questions
- What happens when server X crashes?
- How do we avoid losing data?
- How do we recover from failures?
- How do we avoid double-processing?
Reliability Technique 1: Replication
Keep multiple copies of everything important.
[Write Request] --> [Primary] --> [Replica 1]
--> [Replica 2]
[Read Request] --> [Replica 1] or [Replica 2]
What to explain:
- Data is copied to multiple machines
- If one machine dies, others have the data
- Trade-off: More replicas = more durability but more cost
- Typically: 3 replicas for important data
Reliability Technique 2: Retries with Backoff
When something fails, try again, but wait longer each time.
def call_external_service(request):
max_retries = 3
base_delay = 1 # second
for attempt in range(max_retries):
try:
return service.call(request)
except TemporaryError:
if attempt == max_retries - 1:
raise # Give up after max retries
# Wait longer each time: 1s, 2s, 4s
delay = base_delay * (2 ** attempt)
# Add some randomness to avoid thundering herd
delay = delay + random(0, delay * 0.1)
sleep(delay)
Reliability Technique 3: Idempotency (Handle Duplicates)
If a request happens twice, the result should be the same.
PROBLEM:
User clicks "Pay" button twice (network was slow).
Without idempotency: User is charged twice!
SOLUTION:
Each request has a unique idempotency key.
def process_payment(idempotency_key, amount):
# Check if we already processed this
existing = database.get("payment:" + idempotency_key)
if existing:
return existing.result # Return same result as before
# Process the payment
result = payment_provider.charge(amount)
# Store the result with the key
database.store("payment:" + idempotency_key, result)
return result
Now if user clicks twice with same idempotency_key:
- First click: Processes payment, stores result
- Second click: Returns stored result, no double charge
Reliability Technique 4: Circuit Breaker
If a service keeps failing, stop calling it for a while.
States:
[CLOSED (Normal)] --failures exceed threshold--> [OPEN (Failing)]
|
timeout expires
↓
[HALF-OPEN (Testing)]
|
test succeeds → CLOSED test fails → OPEN
What to explain:
- CLOSED: Normal operation, requests go through
- When too many failures: Switch to OPEN
- OPEN: Requests fail immediately (don't even try)
- After timeout: Switch to HALF-OPEN, try one request
- If it works: Back to CLOSED. If fails: Back to OPEN
This prevents cascading failures.
Reliability Technique 5: Graceful Degradation
When something breaks, keep working with reduced functionality.
NETFLIX EXAMPLE:
- Normal: Personalized recommendations for you
- Recommendation service down: Show popular movies instead
- Search service down: Show categories to browse
- Everything down: Show cached homepage
TWITTER EXAMPLE:
- Normal: Real-time feed with all features
- Feed service slow: Show cached version of feed
- Image service down: Show tweets without images
- Everything down: Show cached tweets, disable posting
PAYMENT EXAMPLE:
- Primary payment provider down: Try backup provider
- All providers down: Queue payment for later, tell user
- Never: Silently fail and lose the payment
Critical: Never lose money or important data. For payments, orders, and critical data: Always write to durable storage before telling user success. Use database transactions. Have backups. Test your recovery process.
Bonus: Operations (What Interviewers Sometimes Ask)
Monitoring: Know When Things Break
| What to Monitor | Why | Example Metrics |
|---|---|---|
| Request rate | Know your traffic | Requests per second |
| Error rate | Know when things break | 5xx errors per minute |
| Latency | Know when things are slow | p50, p95, p99 response time |
| Resource usage | Know when to scale | CPU, memory, disk usage |
| Business metrics | Know if product works | Sign-ups, purchases, active users |
Deployment: Ship Code Safely
| Strategy | How It Works | When to Use |
|---|---|---|
| Rolling deploy | Update servers one by one | Default choice, low risk |
| Blue-green | Run two environments, switch traffic | When you need instant rollback |
| Canary | Send 1% traffic to new version first | When change is risky |
| Feature flags | Deploy code but toggle feature on/off | When you want control over rollout |
Security Basics (Often Forgotten)
- Authentication: Who is this user? (passwords, OAuth, tokens)
- Authorization: Can this user do this action? (roles, permissions)
- Rate Limiting: Is this user sending too many requests?
- Input Validation: Is this input safe? (prevent SQL injection, XSS)
- Encryption: Data in transit (HTTPS), data at rest (encrypted disks)
- Secrets Management: Never put passwords in code (use vault)
Pro tip: Rate limiting is almost always asked. If you have time, mention it. Say: "We'd add rate limiting at the API gateway to prevent abuse. We could use a token bucket algorithm, each user gets N tokens per minute. This protects our system from bad actors and runaway scripts."
The Complete Interview Checklist
Print this. Practice until it becomes automatic.
□ STEP 1: CLARIFY REQUIREMENTS (5 min)
□ What are the 2-3 core features? (MVP mindset)
□ Who are the users? How many?
□ What scale are we designing for?
□ Any special requirements? (real-time, reliability)
□ Summarize before moving on
□ STEP 2: DESIGN THE API (3 min)
□ List core endpoints (3-5 max)
□ For each: method, path, input, output
□ Which endpoint is the hot path?
□ Is it read-heavy or write-heavy?
□ STEP 3: QUICK MATH (3 min)
□ Requests per second
□ Storage needed
□ Read:Write ratio
□ Peak vs average
□ What do these numbers tell us?
□ STEP 4: COMPUTE MODEL (3 min)
□ For each API: real-time or pre-compute?
□ Explain the tradeoff
□ Make a decision and justify
□ STEP 5: DATABASE CHOICE (3 min)
□ What databases do we need?
□ Start with SQL, justify any NoSQL
□ Explain what goes where
□ STEP 6: DRAW ARCHITECTURE (10 min)
□ Start simple: LB → App → DB
□ Add cache if read-heavy
□ Add queue if need async processing
□ Keep it to 5-8 boxes
□ Explain each component
□ STEP 7: HANDLE SCALE (10 min)
□ Identify the bottleneck
□ Caching strategy
□ Sharding if needed
□ Hot partition handling
□ STEP 8: ADD RELIABILITY (5 min)
□ What happens when X fails?
□ Replication for durability
□ Retries with backoff
□ Idempotency for duplicates
□ WRAP UP (3 min)
□ Summarize what you built
□ State the main tradeoffs
□ Suggest future improvements
□ Ask: Any questions or areas to explore?
Common Mistakes to Avoid
Learn from others' failures:
| Mistake | Why It's Bad | What to Do Instead |
|---|---|---|
| Jumping into drawing boxes | You might solve the wrong problem | Always start with requirements |
| Not doing any math | Your design might not work at scale | Do quick estimation, 3 minutes is enough |
| Using too many technologies | Shows you can't simplify | Start with few components, add only when needed |
| Saying "just use Kafka" for everything | Shows you don't understand tradeoffs | Explain WHY you need each component |
| Not talking about tradeoffs | Seems like you don't understand depth | Every decision has pros and cons, state them |
| Designing for Google scale when not needed | Over-engineering | Match complexity to stated requirements |
| Going silent while thinking | Interviewer can't help you | Think out loud, explain your reasoning |
| Not asking questions | Seems like you don't think critically | Ask clarifying questions, check in with interviewer |
The best tip: Treat the interview like a conversation, not a test. You and the interviewer are designing a system together. Ask for their input: "What do you think about this approach? Should I go deeper here?" This makes it collaborative and less stressful.
Practice Problems by Difficulty
Use this framework on these problems. Time yourself, 45 minutes each.
| Difficulty | Problems | Key Challenge |
|---|---|---|
| Beginner | URL Shortener, Pastebin | ID generation, caching |
| Beginner | Rate Limiter | Distributed counting |
| Medium | Twitter, Instagram | Feed generation, fan-out |
| Medium | Uber, Lyft | Real-time matching, location |
| Medium | Ticketmaster | Handling flash sales, inventory |
| Hard | Google Docs | Real-time collaboration, conflict resolution |
| Hard | WhatsApp, Messenger | Message delivery, ordering |
| Hard | YouTube, Netflix | Video processing, CDN |
Example: Apply Framework to Unknown Problem
Problem: Design a Food Delivery System (like DoorDash)
STEP 1: REQUIREMENTS
- Core: Browse restaurants, place order, track delivery
- Users: Customers, restaurants, drivers
- Scale: 1 million orders per day
STEP 2: API
- GET /restaurants?lat=X&lng=Y (browse nearby)
- POST /orders (place order)
- GET /orders/{id} (track status)
- PUT /drivers/{id}/location (driver updates position)
STEP 3: MATH
- 1M orders/day = ~10 orders/second
- Driver location updates: 50K drivers × every 5 seconds = 10K/second
- Read-heavy for browsing, write-heavy for locations
STEP 4: COMPUTE MODEL
- Restaurant list: Pre-compute (doesn't change often)
- Driver location: Real-time (changes constantly)
- Order matching: Real-time (need to match quickly)
STEP 5: DATABASE
- PostgreSQL: Users, Restaurants, Orders (need transactions)
- Redis: Driver locations (ephemeral, fast updates)
- Elasticsearch: Restaurant search (full-text, geo)
STEP 6: ARCHITECTURE
- API Gateway → Restaurant Service, Order Service, Driver Service
- Redis for real-time driver locations
- Message queue for order → driver matching
STEP 7: SCALE
- Cache restaurant data (doesn't change often)
- Shard orders by city/region
- Hot spot: Popular restaurants - cache their data
STEP 8: RELIABILITY
- Orders must not be lost (database + queue)
- Driver location can be approximate (no replication needed)
- Payment must be idempotent (no double charges)
Most important practice tip: Practice out loud. Explaining your thoughts verbally is a different skill than thinking them. Record yourself and listen back. You'll be surprised how often you ramble or skip important points. Practice until you can explain clearly in 45 minutes.
Key Tradeoffs to Remember
Every system design involves tradeoffs. Here are the big ones:
Real-time vs Pre-compute
- Real-time pros: Always fresh data, simpler storage
- Real-time cons: Slow for complex queries, heavy compute load
- Pre-compute pros: Fast reads, handles high read volume
- Pre-compute cons: Stale data, storage overhead, write amplification
- When to use: Real-time when data changes often or freshness is critical. Pre-compute when read:write ratio is high and slight staleness is okay.
SQL vs NoSQL
- SQL pros: Transactions, flexible queries, well understood
- SQL cons: Scaling writes is hard, schema changes slow
- NoSQL pros: High write throughput, horizontal scale, flexible schema
- NoSQL cons: No transactions, limited queries, eventual consistency
- When to use: Start with SQL for most cases. Use NoSQL only when you need massive write throughput, flexible schema, or simple key-value access patterns.
Sync API vs Async Processing
- Sync pros: Simple, immediate response
- Sync cons: Slow if processing takes time
- Async pros: Fast response, handles failures better
- Async cons: Complex, need to track status
- When to use: Sync for fast operations and when user needs immediate result. Async for slow operations, external calls, or when okay to process later.
Final Thoughts
System design interviews aren't about memorizing architectures. They're about demonstrating:
- You can break down ambiguous problems , The requirements phase shows this
- You understand tradeoffs , Every decision has pros and cons
- You can communicate technical ideas clearly , Think out loud, structure your explanation
- You can design for scale and reliability , Know what breaks and how to fix it
This framework gives you structure. Practice gives you depth. Together, they'll help you pass system design interviews at any company.
Remember: Every system is just take data in, store it, give it back. The hard part is doing this fast, reliably, and at scale. Now you have the tools to handle it.---
Frequently Asked Questions
Can I really use this framework for ANY system design question?
Yes. The 8 steps are intentionally universal. Whether you're designing a URL shortener, Netflix, or a distributed cache, you still need to: clarify requirements, design APIs, do math, make compute decisions, choose databases, draw architecture, handle scale, and add reliability. The specific answers change, but the process doesn't.
What if the interviewer wants to skip steps?
Adapt to their style. If they say "assume 100M users and focus on the architecture," skip steps 1-3 quickly and dive in. The framework is a guide, not a rigid script. The interviewer drives the conversation, you provide structure.
How do I know if I'm going too deep or staying too shallow?
Check in with the interviewer. Ask: "Should I go deeper on the caching strategy, or move on to reliability?" They'll guide you. If you're spending more than 5 minutes on any single component without interviewer prompting, you're probably going too deep.
What if I don't know a technology they mention?
Be honest: "I'm not deeply familiar with Cassandra's internals, but based on what I know about LSM trees and distributed databases, I'd expect..." Reason from first principles. Never pretend to know something you don't, interviewers can tell.
How long should I practice before I'm ready?
Most engineers need 2-4 weeks of focused practice. Week 1: Learn the framework structure. Week 2: Build depth on components (databases, caching, queues). Week 3: Practice communication through mock interviews. Week 4: Polish weak areas. The more you practice out loud, the faster you'll improve.
Ready to Master System Design Interviews?
Learn from 25+ real interview problems from Netflix, Uber, Google, and Stripe. Created by a senior engineer who's taken 200+ system design interviews at FAANG companies.
Complete Solutions
Architecture diagrams & trade-off analysis
Real Interview Problems
From actual FAANG interviews
7-day money-back guarantee • Lifetime access • New problems added quarterly
FREE: System Design Interview Cheat Sheet
Get the 7-page PDF cheat sheet with critical numbers, decision frameworks, and the interview approach used by 10,000+ engineers.
No spam. Unsubscribe anytime.
Related Articles
Why Distributed Systems Fail: 15 Failure Scenarios Every Engineer Must Know
A comprehensive guide to the most common failure modes in distributed systems, from network partitions to split-brain scenarios, with practical fixes for each.
Read moreThe 7 System Design Problems You Must Know Before Your Interview
These 7 system design questions appear in 80% of interviews at Google, Meta, Amazon, and Netflix. Master them, and you can handle any variation.
Read moreAmazon System Design Interview: Leadership Principles Meet Distributed Systems
How Amazon's system design interviews differ from other FAANG companies. Real questions, LP integration, and what bar raisers actually look for.
Read more