The Only System Design Framework You Need
A proven framework for approaching any system design interview question. Step-by-step process used by engineers who passed Google, Meta, and Amazon.
Ready to Master System Design Interviews?
Learn from 25+ real interview problems from Netflix, Uber, Google, and Stripe. Created by a senior engineer who's taken 200+ system design interviews at FAANG companies.
Complete Solutions
Architecture diagrams & trade-off analysis
Real Interview Problems
From actual FAANG interviews
7-day money-back guarantee • Lifetime access • New problems added quarterly
You walk into a system design interview. The interviewer says: "Design Uber."
Your mind races. Where do you start? Matching algorithm? Maps? Payment system? There's so much to cover.
This is where most engineers fail. Not because they lack knowledge, but because they lack structure.
After conducting 200+ system design interviews and seeing what separates passing candidates from failing ones, I've distilled the process into a four-step framework that works for any question.
Why You Need a Framework
System design interviews are unlike coding interviews. There's no single "correct" answer. The interviewer is evaluating:
- Can you break down ambiguous problems?
- Do you understand trade-offs?
- Can you communicate technical ideas clearly?
- Do you think about scale, reliability, and edge cases?
A framework gives you:
- Structure , You know what to do at each stage
- Time management , You allocate time appropriately
- Confidence , You don't panic when given an unfamiliar question
- Completeness , You don't forget critical parts
The 4-Step Framework
| Step | Time | What You Do |
|---|---|---|
| 1. Clarify Requirements | 5 min | Ask questions, define scope |
| 2. Estimate Scale | 5 min | Back-of-envelope math |
| 3. Design Architecture | 15-20 min | Draw and explain components |
| 4. Deep Dive | 15-20 min | Go deep on 2-3 areas |
Let's break down each step.
Step 1: Clarify Requirements (5 minutes)
Never start designing immediately.
The biggest mistake candidates make is jumping into solutions. "We'll use Kafka and Redis and..." Stop. You don't even know what you're building yet.
Ask Functional Requirements Questions
Understand what the system needs to do:
"What are the core features we need to support?"
"Who are the users? Are they consumers, businesses, or both?"
"What actions can users take?"
Example for "Design Twitter":
"I want to clarify the functional requirements. I'm assuming we need to support:
- Posting tweets
- Following other users
- Viewing a home timeline
Should I also include search, trending topics, or direct messages?"
Ask Non-Functional Requirements Questions
Understand the constraints:
"What's the expected scale? How many users?"
"What are the latency requirements? Is real-time important?"
"What's the read-to-write ratio?"
"Are there geographic requirements? Global users?"
"What's more important: consistency or availability?"
Example:
"For scale, should I assume we're designing for Twitter's actual scale, hundreds of millions of users? Or a smaller startup version?"
"For the timeline, is it acceptable if a new tweet takes a few seconds to appear, or do we need real-time delivery?"
Why This Matters
Asking questions shows you:
- Think before acting
- Understand that requirements drive design
- Can identify ambiguity
Pro tip: Write down the requirements as you discuss them. This becomes your reference throughout the interview.
Sample Requirements Summary
After 5 minutes, you should have something like:
Functional Requirements:
- Users can post tweets (280 chars, optional images)
- Users can follow other users
- Users see home timeline (tweets from followed users)
- Basic search for users and tweets
Non-Functional Requirements:
- 300 million DAU
- 500 million tweets per day
- Timeline latency < 500ms
- Eventually consistent (tweets can take a few seconds to appear)
- Global users across multiple regions
Step 2: Estimate Scale (5 minutes)
Do quick math to inform your design. This isn't about exact numbers, it's about understanding the order of magnitude.
Key Numbers to Estimate
- Read/Write QPS , Requests per second
- Storage , How much data over time
- Bandwidth , Data transfer requirements
- Memory , Cache size if needed
Example: Twitter Scale Estimation
Writes (Tweets):
500 million tweets per day
= 500M / 86,400 seconds
≈ 5,800 tweets per second
Peak (2x average): ~12,000 tweets/second
Reads (Timeline views):
300 million DAU
Average 10 timeline views per day
= 3 billion timeline views per day
= 3B / 86,400
≈ 35,000 timeline views per second
Peak: ~70,000/second
Read:Write ratio: 35,000 / 5,800 ≈ 6:1 (read-heavy)
Storage:
Average tweet: 280 chars text + metadata = ~500 bytes
500M tweets/day × 500 bytes = 250 GB/day
Per year: 250 GB × 365 = 91 TB/year
5 years: ~450 TB
What this tells us:
- System is read-heavy → optimize for reads, cache heavily
- ~70K QPS for reads → need distributed caching
- ~500 TB storage → need sharding
- Global users → consider multi-region
Numbers to Memorize
Keep these in your head for quick math:
| Unit | Approximate Value |
|---|---|
| Seconds per day | 86,400 (~100K) |
| Seconds per month | 2.5 million |
| 1 million / day | ~12/second |
| 1 billion / day | ~12,000/second |
Why This Matters
Interviewers want to see that you:
- Think about scale before designing
- Can do quick mental math
- Understand how scale affects architecture
A design for 1,000 users is very different from 100 million users.
Step 3: Design Architecture (15-20 minutes)
Now you design. Start high-level, then add detail.
Start with the Happy Path
Draw the simplest flow that handles the core use case:
[Client] → [Load Balancer] → [API Server] → [Database]
Then ask yourself: "What breaks at scale?"
Add Components Systematically
For read-heavy systems, add caching:
[Client] → [LB] → [API] → [Cache] → [DB]
For high write throughput, add message queues:
[Client] → [LB] → [API] → [Queue] → [Workers] → [DB]
For global users, consider CDN and multi-region:
[Client] → [CDN] → [LB] → [API] → [Cache] → [DB]
↑
[Replication from Primary]
Example: Twitter Architecture
┌─────────────┐
│ CDN │
│(static assets)
└──────┬──────┘
│
┌──────────┐ ┌───────────────▼───────────────┐
│ Client │───▶│ Load Balancer │
└──────────┘ └───────────────┬───────────────┘
│
┌────────────────┼────────────────┐
│ │ │
┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
│ Tweet │ │ Timeline │ │ User │
│ Service │ │ Service │ │ Service │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
│ ┌──────▼──────┐ │
│ │Timeline Cache│ │
│ │ (Redis) │ │
│ └──────┬──────┘ │
│ │ │
┌──────▼────────────────▼────────────────▼──────┐
│ Databases │
│ (Tweet DB) (Timeline DB) (User DB) │
└───────────────────────────────────────────────┘
Explain As You Draw
Don't draw silently. Explain your thinking:
"I'm starting with a load balancer to distribute traffic across multiple API servers. This gives us horizontal scalability and fault tolerance."
"I'm separating into three services, Tweet, Timeline, and User, because they have different access patterns and can scale independently."
"For the timeline, I'm adding a Redis cache because we calculated 70K reads per second. That's too much for direct database queries."
Component Checklist
Make sure you cover:
| Component | Purpose | When to Include |
|---|---|---|
| Load Balancer | Distribute traffic | Always |
| CDN | Static content, reduce latency | Global users, static assets |
| Cache | Reduce database load | Read-heavy systems |
| Message Queue | Async processing | Write-heavy, decoupling needed |
| Database | Persistent storage | Always |
| Search | Full-text search | Search feature required |
| Object Storage | Large files (images, videos) | Media storage needed |
Step 4: Deep Dive (15-20 minutes)
The interviewer will push you to go deeper. Be ready for questions like:
- "Tell me more about the database schema."
- "How does the timeline service work?"
- "What happens if the cache goes down?"
- "How would you handle a celebrity with 50 million followers?"
Pick 2-3 Areas to Go Deep
You can't cover everything deeply. Choose the most interesting or challenging areas:
- Database design , Schema, indexing, sharding
- Caching strategy , What to cache, invalidation
- Critical algorithms , Feed ranking, matching
- Failure handling , What happens when X fails?
Example Deep Dive: Timeline Service
Interviewer: "Tell me more about how the timeline works."
You: "There are two main approaches for generating timelines: fan-out on write and fan-out on read.
Fan-out on write: When someone tweets, we immediately push that tweet to all their followers' timelines in cache. This is fast for reads, the timeline is pre-computed. But it's expensive for writes if the user has millions of followers.
Fan-out on read: We don't pre-compute. When a user loads their timeline, we fetch recent tweets from everyone they follow and merge them. This is expensive for reads but cheap for writes.
For Twitter, I'd use a hybrid approach. For regular users, fan-out on write, most people have a few hundred followers. For celebrities with millions of followers, fan-out on read, we fetch their tweets at read time and merge with the pre-computed timeline.
This way we get the best of both: fast reads for most users, without overwhelming the system when a celebrity tweets."
Show Trade-off Awareness
Always acknowledge trade-offs:
"We could use Cassandra instead of PostgreSQL here. Cassandra would give us better write throughput and easier horizontal scaling. But we'd lose ACID transactions and complex queries. Since this service mostly does simple key-value lookups, I think Cassandra is the better choice."
Handle "What If" Questions
Interviewer: "What if Redis goes down?"
You: "Good question. A few things:
First, I'd run Redis in a cluster with replication. If one node fails, others continue serving.
Second, if the entire cache layer fails, we fall back to the database. We'd see higher latency and might need to rate-limit, but the system stays up.
Third, I'd set up monitoring to alert on cache hit rate drops. If it suddenly drops, we know something's wrong before users notice.
Fourth, for the most critical data, I might use a write-through cache so even if Redis fails, we don't lose recent data, it's already in the database."
Time Management
One of the biggest failure modes is running out of time. Here's how to manage it:
| Phase | Target Time | If You're Running Over |
|---|---|---|
| Requirements | 5 min | Wrap up, state assumptions |
| Estimation | 5 min | Skip detailed calculations, estimate orders of magnitude |
| Architecture | 15-20 min | Focus on core path, skip edge cases |
| Deep Dive | 15-20 min | Pick fewer areas, go deeper on each |
Check the time periodically. If you've spent 15 minutes on requirements, you're going too slow.
Signals You're On Track
Good pace:
- Requirements done in 5 minutes
- High-level architecture sketched by minute 15
- Deep diving by minute 20
Too slow:
- Still clarifying requirements at minute 10
- No diagram at minute 20
- Haven't discussed any trade-offs by minute 30
Recovering From Time Pressure
If you're running out of time:
"I notice we have about 10 minutes left. Let me focus on the most critical part, the timeline service, and we can discuss other areas if time permits."
Interviewers appreciate self-awareness and prioritization.
Communication Throughout
System design is as much about communication as technical knowledge.
Think Out Loud
Don't go silent. Share your reasoning:
"I'm thinking about whether to use SQL or NoSQL here. The access pattern is mostly key-value lookups by user ID, which suggests NoSQL. But we also need to query by timestamp for the timeline, so..."
Check In With the Interviewer
"Does this level of detail make sense, or would you like me to go deeper on any component?"
"I was planning to discuss the caching strategy next. Is that the right direction, or is there something else you'd like me to focus on?"
Use Clear Structure
When explaining, use structure:
"For the database, I'm going to cover three things: the schema, the indexing strategy, and how we'd handle sharding."
This helps the interviewer follow along and shows organized thinking.
Common Pitfalls to Avoid
1. Not Clarifying Requirements
Bad: "So, Twitter. Let me start with the architecture..."
Good: "Before I start, I want to clarify the requirements. What features should we focus on?"
2. Skipping Scale Estimation
Bad: "We'll use a database."
Good: "We calculated 35,000 reads per second. That's too much for a single database, so we'll need caching and possibly read replicas."
3. Jumping to Technologies
Bad: "We'll use Kafka, Redis, Cassandra, and Kubernetes."
Good: "We need a message queue for async processing because..." (explain the why)
4. Not Discussing Trade-offs
Bad: "We'll use PostgreSQL."
Good: "I'm choosing PostgreSQL over Cassandra because we need ACID transactions for payment processing. The trade-off is that horizontal scaling is harder, but at our expected scale, we can handle it with read replicas."
5. Going Too Deep Too Early
Bad: Spending 10 minutes on database indexing before drawing any architecture.
Good: Sketch the full system first, then go deep where it matters.
6. Ignoring the Interviewer's Hints
If the interviewer says "Interesting, what about handling failures?", they're telling you what they want to hear about. Follow their lead.
Framework Checklist
Use this checklist during practice:
Requirements Phase
- Asked about functional requirements
- Asked about scale (users, QPS)
- Asked about latency requirements
- Asked about consistency vs. availability
- Wrote down key requirements
Estimation Phase
- Calculated read/write QPS
- Estimated storage requirements
- Identified read-heavy vs. write-heavy
- Numbers informed design decisions
Design Phase
- Drew high-level architecture
- Included all necessary components
- Explained component choices
- Addressed data flow
- Considered single points of failure
Deep Dive Phase
- Went deep on 2-3 components
- Discussed trade-offs
- Addressed failure scenarios
- Answered follow-up questions thoroughly
Applying the Framework: Quick Examples
Design a URL Shortener
Requirements:
- Create short URLs, redirect to long URLs
- 100M URLs/day, 100:1 read/write ratio
- < 100ms redirect latency
Scale:
- Writes: 1,200/sec, Reads: 120,000/sec
- Storage: 30TB over 5 years
Architecture:
- Load balancer → API servers → Cache (Redis) → Database
- Key insight: Read-heavy, cache aggressively
Deep dive:
- Short code generation algorithms (hash vs. counter)
- Caching strategy for popular URLs
Design a Chat Application
Requirements:
- 1:1 and group messaging
- Real-time delivery
- 100M users, 1B messages/day
Scale:
- 12,000 messages/sec
- Need persistent connections (WebSockets)
Architecture:
- WebSocket servers → Message queue → Delivery service → Database
- Key insight: Connection state management is critical
Deep dive:
- How to route messages to correct WebSocket server
- Handling offline users (message queuing)
Design a News Feed
Requirements:
- Posts from friends
- Ranked by relevance
- 300M users, high read volume
Scale:
- Reads: 100K/sec, Writes: 10K/sec
- Read-heavy, pre-computation valuable
Architecture:
- API → Fan-out service → Timeline cache → Database
- Key insight: Fan-out on write vs. read trade-off
Deep dive:
- Ranking algorithm
- Handling celebrities (hybrid fan-out)
Practice This Framework
The framework only works if you practice it. Here's how:
Week 1: Learn the Steps
Practice the framework structure without worrying about correctness:
- Pick 3 questions
- Time yourself
- Focus on hitting all 4 steps
Week 2: Build Depth
Go deeper on components:
- Databases (SQL, NoSQL, when to use each)
- Caching (patterns, invalidation)
- Message queues (Kafka, RabbitMQ)
- Load balancing strategies
Week 3: Practice Communication
Do mock interviews:
- Practice explaining while drawing
- Get feedback on communication
- Work on time management
Week 4: Polish
- Redo questions you struggled with
- Focus on trade-off discussions
- Practice handling curveball questions
Final Thoughts
System design interviews aren't about knowing everything. They're about:
- Showing a structured approach , The framework demonstrates this
- Communicating clearly , Think out loud, check in
- Making reasonable trade-offs , There's no perfect design
- Going deep when asked , Prove you have real knowledge
The framework gives you structure. Practice gives you depth. Together, they'll help you pass system design interviews at any company.---
Frequently Asked Questions
Can I use this framework for any system design question?
Yes. The framework is intentionally general. Whether you're designing Twitter, a parking lot system, or a distributed cache, the four steps apply: clarify requirements, estimate scale, design architecture, deep dive.
What if the interviewer interrupts my framework?
Adapt. The framework is a guide, not a rigid script. If the interviewer wants to skip to deep dive, do it. If they want to spend more time on requirements, follow their lead. The framework ensures you don't forget important parts, but the interviewer drives the conversation.
How do I know when to move to the next step?
Check in with the interviewer: "I think I have a good understanding of the requirements. Should I move on to estimating scale?" They'll tell you if they want more discussion or if you should proceed.
What if I don't know a specific technology they ask about?
Be honest: "I'm not deeply familiar with Cassandra's compaction strategies, but based on what I know about LSM trees, I'd expect..." Show you can reason from first principles. Never pretend to know something you don't.
Ready to Master System Design Interviews?
Learn from 25+ real interview problems from Netflix, Uber, Google, and Stripe. Created by a senior engineer who's taken 200+ system design interviews at FAANG companies.
Complete Solutions
Architecture diagrams & trade-off analysis
Real Interview Problems
From actual FAANG interviews
7-day money-back guarantee • Lifetime access • New problems added quarterly
FREE: System Design Interview Cheat Sheet
Get the 7-page PDF cheat sheet with critical numbers, decision frameworks, and the interview approach used by 10,000+ engineers.
No spam. Unsubscribe anytime.
Related Articles
Why Distributed Systems Fail: 15 Failure Scenarios Every Engineer Must Know
A comprehensive guide to the most common failure modes in distributed systems, from network partitions to split-brain scenarios, with practical fixes for each.
Read moreThe 7 System Design Problems You Must Know Before Your Interview
These 7 system design questions appear in 80% of interviews at Google, Meta, Amazon, and Netflix. Master them, and you can handle any variation.
Read moreAmazon System Design Interview: Leadership Principles Meet Distributed Systems
How Amazon's system design interviews differ from other FAANG companies. Real questions, LP integration, and what bar raisers actually look for.
Read more