Back to Blog
system-designframeworkinterview

The Only System Design Framework You Need

A proven framework for approaching any system design interview question. Step-by-step process used by engineers who passed Google, Meta, and Amazon.

15 min readBy SystemExperts
From the Interviewer's Side

Ready to Master System Design Interviews?

Learn from 25+ real interview problems from Netflix, Uber, Google, and Stripe. Created by a senior engineer who's taken 200+ system design interviews at FAANG companies.

Complete Solutions

Architecture diagrams & trade-off analysis

Real Interview Problems

From actual FAANG interviews

7-day money-back guarantee • Lifetime access • New problems added quarterly

You walk into a system design interview. The interviewer says: "Design Uber."

Your mind races. Where do you start? Matching algorithm? Maps? Payment system? There's so much to cover.

This is where most engineers fail. Not because they lack knowledge, but because they lack structure.

After conducting 200+ system design interviews and seeing what separates passing candidates from failing ones, I've distilled the process into a four-step framework that works for any question.


Why You Need a Framework

System design interviews are unlike coding interviews. There's no single "correct" answer. The interviewer is evaluating:

  • Can you break down ambiguous problems?
  • Do you understand trade-offs?
  • Can you communicate technical ideas clearly?
  • Do you think about scale, reliability, and edge cases?

A framework gives you:

  1. Structure , You know what to do at each stage
  2. Time management , You allocate time appropriately
  3. Confidence , You don't panic when given an unfamiliar question
  4. Completeness , You don't forget critical parts

The 4-Step Framework

StepTimeWhat You Do
1. Clarify Requirements5 minAsk questions, define scope
2. Estimate Scale5 minBack-of-envelope math
3. Design Architecture15-20 minDraw and explain components
4. Deep Dive15-20 minGo deep on 2-3 areas

Let's break down each step.


Step 1: Clarify Requirements (5 minutes)

Never start designing immediately.

The biggest mistake candidates make is jumping into solutions. "We'll use Kafka and Redis and..." Stop. You don't even know what you're building yet.

Ask Functional Requirements Questions

Understand what the system needs to do:

"What are the core features we need to support?"

"Who are the users? Are they consumers, businesses, or both?"

"What actions can users take?"

Example for "Design Twitter":

"I want to clarify the functional requirements. I'm assuming we need to support:

  • Posting tweets
  • Following other users
  • Viewing a home timeline

Should I also include search, trending topics, or direct messages?"

Ask Non-Functional Requirements Questions

Understand the constraints:

"What's the expected scale? How many users?"

"What are the latency requirements? Is real-time important?"

"What's the read-to-write ratio?"

"Are there geographic requirements? Global users?"

"What's more important: consistency or availability?"

Example:

"For scale, should I assume we're designing for Twitter's actual scale, hundreds of millions of users? Or a smaller startup version?"

"For the timeline, is it acceptable if a new tweet takes a few seconds to appear, or do we need real-time delivery?"

Why This Matters

Asking questions shows you:

  • Think before acting
  • Understand that requirements drive design
  • Can identify ambiguity

Pro tip: Write down the requirements as you discuss them. This becomes your reference throughout the interview.

Sample Requirements Summary

After 5 minutes, you should have something like:

Functional Requirements:
- Users can post tweets (280 chars, optional images)
- Users can follow other users
- Users see home timeline (tweets from followed users)
- Basic search for users and tweets

Non-Functional Requirements:
- 300 million DAU
- 500 million tweets per day
- Timeline latency < 500ms
- Eventually consistent (tweets can take a few seconds to appear)
- Global users across multiple regions

Step 2: Estimate Scale (5 minutes)

Do quick math to inform your design. This isn't about exact numbers, it's about understanding the order of magnitude.

Key Numbers to Estimate

  1. Read/Write QPS , Requests per second
  2. Storage , How much data over time
  3. Bandwidth , Data transfer requirements
  4. Memory , Cache size if needed

Example: Twitter Scale Estimation

Writes (Tweets):

500 million tweets per day
= 500M / 86,400 seconds
≈ 5,800 tweets per second

Peak (2x average): ~12,000 tweets/second

Reads (Timeline views):

300 million DAU
Average 10 timeline views per day
= 3 billion timeline views per day
= 3B / 86,400
≈ 35,000 timeline views per second

Peak: ~70,000/second

Read:Write ratio: 35,000 / 5,800 ≈ 6:1 (read-heavy)

Storage:

Average tweet: 280 chars text + metadata = ~500 bytes
500M tweets/day × 500 bytes = 250 GB/day
Per year: 250 GB × 365 = 91 TB/year
5 years: ~450 TB

What this tells us:

  • System is read-heavy → optimize for reads, cache heavily
  • ~70K QPS for reads → need distributed caching
  • ~500 TB storage → need sharding
  • Global users → consider multi-region

Numbers to Memorize

Keep these in your head for quick math:

UnitApproximate Value
Seconds per day86,400 (~100K)
Seconds per month2.5 million
1 million / day~12/second
1 billion / day~12,000/second

Why This Matters

Interviewers want to see that you:

  • Think about scale before designing
  • Can do quick mental math
  • Understand how scale affects architecture

A design for 1,000 users is very different from 100 million users.


Step 3: Design Architecture (15-20 minutes)

Now you design. Start high-level, then add detail.

Start with the Happy Path

Draw the simplest flow that handles the core use case:

[Client] → [Load Balancer] → [API Server] → [Database]

Then ask yourself: "What breaks at scale?"

Add Components Systematically

For read-heavy systems, add caching:

[Client] → [LB] → [API] → [Cache] → [DB]

For high write throughput, add message queues:

[Client] → [LB] → [API] → [Queue] → [Workers] → [DB]

For global users, consider CDN and multi-region:

[Client] → [CDN] → [LB] → [API] → [Cache] → [DB]
                                     ↑
                            [Replication from Primary]

Example: Twitter Architecture

                         ┌─────────────┐
                         │     CDN     │
                         │(static assets)
                         └──────┬──────┘
                                │
┌──────────┐    ┌───────────────▼───────────────┐
│  Client  │───▶│       Load Balancer           │
└──────────┘    └───────────────┬───────────────┘
                                │
               ┌────────────────┼────────────────┐
               │                │                │
        ┌──────▼──────┐  ┌──────▼──────┐  ┌──────▼──────┐
        │   Tweet     │  │  Timeline   │  │    User     │
        │   Service   │  │   Service   │  │   Service   │
        └──────┬──────┘  └──────┬──────┘  └──────┬──────┘
               │                │                │
               │         ┌──────▼──────┐        │
               │         │Timeline Cache│        │
               │         │   (Redis)   │        │
               │         └──────┬──────┘        │
               │                │                │
        ┌──────▼────────────────▼────────────────▼──────┐
        │                    Databases                   │
        │  (Tweet DB)    (Timeline DB)    (User DB)     │
        └───────────────────────────────────────────────┘

Explain As You Draw

Don't draw silently. Explain your thinking:

"I'm starting with a load balancer to distribute traffic across multiple API servers. This gives us horizontal scalability and fault tolerance."

"I'm separating into three services, Tweet, Timeline, and User, because they have different access patterns and can scale independently."

"For the timeline, I'm adding a Redis cache because we calculated 70K reads per second. That's too much for direct database queries."

Component Checklist

Make sure you cover:

ComponentPurposeWhen to Include
Load BalancerDistribute trafficAlways
CDNStatic content, reduce latencyGlobal users, static assets
CacheReduce database loadRead-heavy systems
Message QueueAsync processingWrite-heavy, decoupling needed
DatabasePersistent storageAlways
SearchFull-text searchSearch feature required
Object StorageLarge files (images, videos)Media storage needed

Step 4: Deep Dive (15-20 minutes)

The interviewer will push you to go deeper. Be ready for questions like:

  • "Tell me more about the database schema."
  • "How does the timeline service work?"
  • "What happens if the cache goes down?"
  • "How would you handle a celebrity with 50 million followers?"

Pick 2-3 Areas to Go Deep

You can't cover everything deeply. Choose the most interesting or challenging areas:

  • Database design , Schema, indexing, sharding
  • Caching strategy , What to cache, invalidation
  • Critical algorithms , Feed ranking, matching
  • Failure handling , What happens when X fails?

Example Deep Dive: Timeline Service

Interviewer: "Tell me more about how the timeline works."

You: "There are two main approaches for generating timelines: fan-out on write and fan-out on read.

Fan-out on write: When someone tweets, we immediately push that tweet to all their followers' timelines in cache. This is fast for reads, the timeline is pre-computed. But it's expensive for writes if the user has millions of followers.

Fan-out on read: We don't pre-compute. When a user loads their timeline, we fetch recent tweets from everyone they follow and merge them. This is expensive for reads but cheap for writes.

For Twitter, I'd use a hybrid approach. For regular users, fan-out on write, most people have a few hundred followers. For celebrities with millions of followers, fan-out on read, we fetch their tweets at read time and merge with the pre-computed timeline.

This way we get the best of both: fast reads for most users, without overwhelming the system when a celebrity tweets."

Show Trade-off Awareness

Always acknowledge trade-offs:

"We could use Cassandra instead of PostgreSQL here. Cassandra would give us better write throughput and easier horizontal scaling. But we'd lose ACID transactions and complex queries. Since this service mostly does simple key-value lookups, I think Cassandra is the better choice."

Handle "What If" Questions

Interviewer: "What if Redis goes down?"

You: "Good question. A few things:

First, I'd run Redis in a cluster with replication. If one node fails, others continue serving.

Second, if the entire cache layer fails, we fall back to the database. We'd see higher latency and might need to rate-limit, but the system stays up.

Third, I'd set up monitoring to alert on cache hit rate drops. If it suddenly drops, we know something's wrong before users notice.

Fourth, for the most critical data, I might use a write-through cache so even if Redis fails, we don't lose recent data, it's already in the database."


Time Management

One of the biggest failure modes is running out of time. Here's how to manage it:

PhaseTarget TimeIf You're Running Over
Requirements5 minWrap up, state assumptions
Estimation5 minSkip detailed calculations, estimate orders of magnitude
Architecture15-20 minFocus on core path, skip edge cases
Deep Dive15-20 minPick fewer areas, go deeper on each

Check the time periodically. If you've spent 15 minutes on requirements, you're going too slow.

Signals You're On Track

Good pace:

  • Requirements done in 5 minutes
  • High-level architecture sketched by minute 15
  • Deep diving by minute 20

Too slow:

  • Still clarifying requirements at minute 10
  • No diagram at minute 20
  • Haven't discussed any trade-offs by minute 30

Recovering From Time Pressure

If you're running out of time:

"I notice we have about 10 minutes left. Let me focus on the most critical part, the timeline service, and we can discuss other areas if time permits."

Interviewers appreciate self-awareness and prioritization.


Communication Throughout

System design is as much about communication as technical knowledge.

Think Out Loud

Don't go silent. Share your reasoning:

"I'm thinking about whether to use SQL or NoSQL here. The access pattern is mostly key-value lookups by user ID, which suggests NoSQL. But we also need to query by timestamp for the timeline, so..."

Check In With the Interviewer

"Does this level of detail make sense, or would you like me to go deeper on any component?"

"I was planning to discuss the caching strategy next. Is that the right direction, or is there something else you'd like me to focus on?"

Use Clear Structure

When explaining, use structure:

"For the database, I'm going to cover three things: the schema, the indexing strategy, and how we'd handle sharding."

This helps the interviewer follow along and shows organized thinking.


Common Pitfalls to Avoid

1. Not Clarifying Requirements

Bad: "So, Twitter. Let me start with the architecture..."

Good: "Before I start, I want to clarify the requirements. What features should we focus on?"

2. Skipping Scale Estimation

Bad: "We'll use a database."

Good: "We calculated 35,000 reads per second. That's too much for a single database, so we'll need caching and possibly read replicas."

3. Jumping to Technologies

Bad: "We'll use Kafka, Redis, Cassandra, and Kubernetes."

Good: "We need a message queue for async processing because..." (explain the why)

4. Not Discussing Trade-offs

Bad: "We'll use PostgreSQL."

Good: "I'm choosing PostgreSQL over Cassandra because we need ACID transactions for payment processing. The trade-off is that horizontal scaling is harder, but at our expected scale, we can handle it with read replicas."

5. Going Too Deep Too Early

Bad: Spending 10 minutes on database indexing before drawing any architecture.

Good: Sketch the full system first, then go deep where it matters.

6. Ignoring the Interviewer's Hints

If the interviewer says "Interesting, what about handling failures?", they're telling you what they want to hear about. Follow their lead.


Framework Checklist

Use this checklist during practice:

Requirements Phase

  • Asked about functional requirements
  • Asked about scale (users, QPS)
  • Asked about latency requirements
  • Asked about consistency vs. availability
  • Wrote down key requirements

Estimation Phase

  • Calculated read/write QPS
  • Estimated storage requirements
  • Identified read-heavy vs. write-heavy
  • Numbers informed design decisions

Design Phase

  • Drew high-level architecture
  • Included all necessary components
  • Explained component choices
  • Addressed data flow
  • Considered single points of failure

Deep Dive Phase

  • Went deep on 2-3 components
  • Discussed trade-offs
  • Addressed failure scenarios
  • Answered follow-up questions thoroughly

Applying the Framework: Quick Examples

Design a URL Shortener

Requirements:

  • Create short URLs, redirect to long URLs
  • 100M URLs/day, 100:1 read/write ratio
  • < 100ms redirect latency

Scale:

  • Writes: 1,200/sec, Reads: 120,000/sec
  • Storage: 30TB over 5 years

Architecture:

  • Load balancer → API servers → Cache (Redis) → Database
  • Key insight: Read-heavy, cache aggressively

Deep dive:

  • Short code generation algorithms (hash vs. counter)
  • Caching strategy for popular URLs

Design a Chat Application

Requirements:

  • 1:1 and group messaging
  • Real-time delivery
  • 100M users, 1B messages/day

Scale:

  • 12,000 messages/sec
  • Need persistent connections (WebSockets)

Architecture:

  • WebSocket servers → Message queue → Delivery service → Database
  • Key insight: Connection state management is critical

Deep dive:

  • How to route messages to correct WebSocket server
  • Handling offline users (message queuing)

Design a News Feed

Requirements:

  • Posts from friends
  • Ranked by relevance
  • 300M users, high read volume

Scale:

  • Reads: 100K/sec, Writes: 10K/sec
  • Read-heavy, pre-computation valuable

Architecture:

  • API → Fan-out service → Timeline cache → Database
  • Key insight: Fan-out on write vs. read trade-off

Deep dive:

  • Ranking algorithm
  • Handling celebrities (hybrid fan-out)

Practice This Framework

The framework only works if you practice it. Here's how:

Week 1: Learn the Steps

Practice the framework structure without worrying about correctness:

  • Pick 3 questions
  • Time yourself
  • Focus on hitting all 4 steps

Week 2: Build Depth

Go deeper on components:

  • Databases (SQL, NoSQL, when to use each)
  • Caching (patterns, invalidation)
  • Message queues (Kafka, RabbitMQ)
  • Load balancing strategies

Week 3: Practice Communication

Do mock interviews:

  • Practice explaining while drawing
  • Get feedback on communication
  • Work on time management

Week 4: Polish

  • Redo questions you struggled with
  • Focus on trade-off discussions
  • Practice handling curveball questions

Final Thoughts

System design interviews aren't about knowing everything. They're about:

  1. Showing a structured approach , The framework demonstrates this
  2. Communicating clearly , Think out loud, check in
  3. Making reasonable trade-offs , There's no perfect design
  4. Going deep when asked , Prove you have real knowledge

The framework gives you structure. Practice gives you depth. Together, they'll help you pass system design interviews at any company.---

Frequently Asked Questions

Can I use this framework for any system design question?

Yes. The framework is intentionally general. Whether you're designing Twitter, a parking lot system, or a distributed cache, the four steps apply: clarify requirements, estimate scale, design architecture, deep dive.

What if the interviewer interrupts my framework?

Adapt. The framework is a guide, not a rigid script. If the interviewer wants to skip to deep dive, do it. If they want to spend more time on requirements, follow their lead. The framework ensures you don't forget important parts, but the interviewer drives the conversation.

How do I know when to move to the next step?

Check in with the interviewer: "I think I have a good understanding of the requirements. Should I move on to estimating scale?" They'll tell you if they want more discussion or if you should proceed.

What if I don't know a specific technology they ask about?

Be honest: "I'm not deeply familiar with Cassandra's compaction strategies, but based on what I know about LSM trees, I'd expect..." Show you can reason from first principles. Never pretend to know something you don't.

From the Interviewer's Side

Ready to Master System Design Interviews?

Learn from 25+ real interview problems from Netflix, Uber, Google, and Stripe. Created by a senior engineer who's taken 200+ system design interviews at FAANG companies.

Complete Solutions

Architecture diagrams & trade-off analysis

Real Interview Problems

From actual FAANG interviews

7-day money-back guarantee • Lifetime access • New problems added quarterly

FREE DOWNLOAD • 7-PAGE PDF

FREE: System Design Interview Cheat Sheet

Get the 7-page PDF cheat sheet with critical numbers, decision frameworks, and the interview approach used by 10,000+ engineers.

Includes:Critical NumbersDecision Frameworks35 Patterns5-Step Method

No spam. Unsubscribe anytime.