system-designinterviewframeworkmethodologyarchitecturedistributed-systemsscalabilityreliabilitytradeoffs

The Universal System Design Framework

A step-by-step guide to design ANY system - even ones you have never seen before

|75 min read

Summary

This framework works for ANY system design question. The secret? Every system is just: take data in, store it, give it back. The hard part is doing this fast, reliably, and at scale. Follow these 8 steps in order: (1) Clarify what to build - MVP features only, (2) Design the API - what endpoints do users call?, (3) Quick math - how big is this system?, (4) Decide compute model - real-time or pre-compute?, (5) Pick your database - SQL unless you have a good reason, (6) Draw the basic architecture - load balancer, servers, database, (7) Handle scale - caching, sharding, hot spots, (8) Add reliability - what happens when things break? Master this framework and you can design anything.

Key Takeaways

Every System is the Same

Every system does three things: take data in, store it, give it back. Twitter, Uber, Netflix - all the same pattern. The difference is HOW MUCH data, HOW FAST, and WHAT HAPPENS when things break.

Think API First

Before drawing boxes, ask: What API endpoints do users call? This tells you what the system actually needs to do. The API is your contract with the outside world.

Real-time vs Pre-compute

The biggest decision: Do you compute results when the user asks (real-time)? Or compute ahead of time and just look up the answer (pre-compute)? This choice shapes your entire design.

SQL by Default

Use SQL (PostgreSQL) unless you have a specific reason not to. It handles most use cases. Only reach for NoSQL when you hit SQLs limits - like massive write throughput or flexible schemas.

Scale is About Bottlenecks

Scaling is not about adding more servers. It is about finding what breaks first (the bottleneck) and fixing it. Usually: database reads, then database writes, then hot partitions.

Plan for Failure

Everything breaks. Servers crash. Networks fail. Databases go down. Your design must answer: What happens when X fails? How do we recover? How do we avoid losing data?

Deep Dive

The Big Picture: What is System Design?

System design is simple at its core.

Every system does three things:

  1. Take data in - Users send requests (post a tweet, upload a photo, request a ride)
  1. Store data - Save it somewhere (database, cache, file storage)
  1. Give data back - Users ask for data (show my feed, display the photo, find me a driver)

That is it. Twitter, Instagram, Uber, Netflix - they all follow this pattern.

What makes it hard?

Three things make system design challenging:

  • Scale - Doing this for 1 user is easy. Doing it for 100 million users is hard.
  • Speed - Users want answers in milliseconds, not seconds.
  • Reliability - The system must work even when servers crash or networks fail.

The 8-Step Framework

This guide gives you 8 steps that work for ANY system:

StepWhat You DoTime in Interview
1Clarify Requirements5 minutes
2Design the API3 minutes
3Quick Math (Estimation)3 minutes
4Decide: Real-time vs Pre-compute3 minutes
5Choose Database3 minutes
6Draw Basic Architecture10 minutes
7Handle Scale10 minutes
8Add Reliability5 minutes
-Wrap up and Questions3 minutes

Let us go through each step.

The 8-Step Framework Flow

Step 1: Clarify Requirements (5 minutes)

Why this matters

The number one reason people fail system design interviews: they solve the wrong problem.

The interviewer says "Design Twitter" and the candidate starts building everything - tweets, retweets, likes, DMs, notifications, trending topics, ads, analytics...

Stop. You have 45 minutes. You cannot build everything.

Think MVP at Scale

Ask yourself: If I had to launch this product tomorrow with millions of users, what are the 2-3 features I MUST have?

For Twitter MVP:

  • Post a tweet
  • See tweets from people you follow
  • Follow/unfollow users

That is it. No likes, no retweets, no DMs. Those come in v2.

The Questions to Ask

Ask these four types of questions:

Question Type 1: What are the core features?

What are the 2-3 things this system MUST do? What can we skip for now? Example: For Uber - request ride, match driver, track ride. Skip: scheduled rides, ride sharing, tipping.

Question Type 2: Who are the users?

Is this for regular people (B2C) or businesses (B2B)? How many users? Are they global or in one region? Example: For Slack - B2B, teams of 10-10,000 people, mostly in one timezone per team.

Question Type 3: What scale are we designing for?

Startup scale (thousands of users) or big tech scale (hundreds of millions)? This changes everything. Example: For a startup chat app vs WhatsApp - completely different designs.

Question Type 4: Any special requirements?

Does it need to be real-time? Super reliable? Work offline? Handle spikes? Example: For a payment system - must never lose data, must handle exactly once.

Example: Clarifying Requirements for Twitter
CANDIDATE: Before I start, let me understand what we are building.

CORE FEATURES:
Me: What are the must-have features?
Interviewer: Posting tweets and seeing your feed.
Me: Should I include likes, retweets, DMs?
Interviewer: Focus on posting and feed for now.

USERS:
Me: How many users are we designing for?
Interviewer: Think Twitter scale - hundreds of millions.
Me: Global users or concentrated in one region?
Interviewer: Global.

SCALE:
Me: Any specific numbers I should target?
Interviewer: Let us say 500 million users, 200 million daily active.

SPECIAL REQUIREMENTS:
Me: How fast should the feed load?
Interviewer: Under 200 milliseconds.
Me: Is it okay if a tweet takes a few seconds to appear in feeds?
Interviewer: Yes, a small delay is fine.

SUMMARY:
- Core: Post tweets + View feed
- Scale: 500M users, 200M daily active
- Latency: Feed loads in under 200ms
- Eventual consistency is okay (small delays acceptable)

Golden Rule: Summarize Before Moving On

Always summarize what you heard before moving to the next step. Say: So to confirm, I am designing X with features Y, for Z users, with these constraints. This shows you listen and prevents mistakes.

Step 2: Design the API (3 minutes)

Think API First

Before drawing any boxes, ask: What API endpoints will users call?

This is powerful because:

  1. It forces you to understand what the system actually does
  1. It tells you what data you need to store
  1. It shows the interviewer you think about interfaces

Keep It Simple

For each core feature, write one API endpoint. Include:

  • The HTTP method (GET, POST, PUT, DELETE)
  • The URL path
  • What goes in (input)
  • What comes out (output)
Example: Twitter API
CORE APIS:

1. POST A TWEET
   POST /tweets
   Input:  { user_id, content, media_ids (optional) }
   Output: { tweet_id, created_at }
   
2. GET HOME FEED
   GET /feed?user_id=123&limit=20&cursor=xyz
   Input:  user_id, how many tweets, where to start
   Output: { tweets: [...], next_cursor }
   
3. FOLLOW A USER
   POST /follow
   Input:  { follower_id, followee_id }
   Output: { success: true }

4. UNFOLLOW A USER
   DELETE /follow
   Input:  { follower_id, followee_id }
   Output: { success: true }

WHAT THIS TELLS US:
- We need to store: tweets, users, who follows whom
- GET /feed is called way more than POST /tweets (read-heavy)
- Feed needs to be fast (called on every app open)
- The hard question: How do we build /feed efficiently?

What the API Tells You

Look at your APIs and ask:

QuestionWhat It Means
Which API is called most?This is your hot path - optimize it
More reads or writes?Read-heavy = caching helps. Write-heavy = need fast database
Need real-time updates?Might need WebSockets or long polling
What data is returned?This shapes your data model

The Feed API is Almost Always the Hard Part

In social apps, the feed API looks simple but is the hardest to build. It needs to be fast, personalized, and handle users who follow thousands of accounts. Keep this in mind - we will solve it later.

Step 3: Quick Math - Back of Envelope (3 minutes)

Why Do Math?

Numbers tell you if your design will work. The design for 1,000 users is totally different from 100 million users.

Do not spend long on this. 3 minutes of quick math is enough.

The Numbers That Matter

Calculate these four things:

  1. Requests per second (RPS) - How many API calls per second?
  1. Storage needed - How much disk space?
  1. Read vs Write ratio - Is it read-heavy or write-heavy?
  1. Peak load - How much higher is busy time vs normal?

Useful Numbers to Remember

Time PeriodSeconds
1 day86,400 (round to 100,000)
1 month2.5 million
1 year30 million
Requests per dayRequests per second
1 million~10 RPS
100 million~1,000 RPS
1 billion~10,000 RPS
Example: Quick Math for Twitter
GIVEN:
- 200 million daily active users
- Each user opens app 5 times per day
- Each user posts 0.5 tweets per day (on average)
- Each user follows 200 people

REQUESTS PER SECOND:

Feed loads (reads):
  200M users × 5 opens = 1 billion feed loads per day
  1 billion ÷ 100,000 seconds = 10,000 reads/second

Tweet posts (writes):
  200M users × 0.5 tweets = 100 million tweets per day
  100 million ÷ 100,000 seconds = 1,000 writes/second

Ratio: 10:1 read-heavy system

STORAGE:

Tweet size: 280 chars + metadata = ~500 bytes
Per day: 100M tweets × 500 bytes = 50 GB per day
Per year: 50 GB × 365 = 18 TB per year

PEAK:
Peak is usually 3-5x average
So plan for: 50,000 reads/second, 5,000 writes/second

WHAT THIS TELLS US:
✓ Read-heavy: caching will help a lot
✓ 10,000+ RPS: need multiple servers
✓ 18 TB/year: big but manageable, sharding may be needed later

Do Not Over-Engineer Based on Numbers

The math shows what MIGHT matter. But do not build for 1 million RPS on day one. Start with a design that handles 10x current load. Know where you will hit limits. That is what interviewers want to see.

Step 4: The Big Decision - Real-time vs Pre-compute (3 minutes)

This is the Most Important Design Decision

For every API endpoint, you have a choice:

Option A: Real-time (Compute on Read) - When the user asks, calculate the answer right then - Example: Search - run the query when user searches

Option B: Pre-compute (Compute on Write) - Calculate the answer ahead of time, store it - When user asks, just look it up - Example: Feed - pre-build each user feed, just fetch it

When to Use Which?

Use Real-time WhenUse Pre-compute When
Results depend on fresh dataSame result requested many times
Too many possible queries to pre-computeResult is expensive to calculate
Writes are rare, reads are rare tooWrites are rare, reads are very frequent
User expects small delayUser expects instant response
Data changes very frequentlyData changes less often than it is read
Example: Twitter Feed - Real-time vs Pre-compute
THE QUESTION:
When a user opens their feed, how do we show tweets from people they follow?

OPTION A: REAL-TIME (Fan-out on Read)

When user opens feed:
1. Look up everyone they follow (200 people)
2. Get recent tweets from each person
3. Merge and sort by time
4. Return top 20 tweets

Pros:
- Always fresh data
- Simple to understand
- No extra storage

Cons:
- Slow! Must query 200 users every time
- Does not scale for users following 10,000 accounts
- Every feed open = heavy database load

OPTION B: PRE-COMPUTE (Fan-out on Write)

When someone posts a tweet:
1. Find all their followers (could be millions)
2. Add this tweet to each follower feed (stored in cache)

When user opens feed:
1. Just read their pre-built feed from cache
2. Super fast - one read operation

Pros:
- Feed loads instantly
- Light database load on read

Cons:
- Celebrity problem: user with 50M followers = 50M writes
- Uses more storage (duplicate tweets in many feeds)
- Small delay before tweet appears in feeds

OPTION C: HYBRID (What Twitter Actually Does)

- For normal users (< 10K followers): pre-compute
- For celebrities (> 10K followers): real-time
- When building feed: get pre-computed tweets + fetch celebrity tweets

This is usually the right answer for social feeds.

Fan-out on Write vs Fan-out on Read

Other Factors for This Decision

FactorLeans Toward Real-timeLeans Toward Pre-compute
Latency requirementCan wait 100ms+Must be < 50ms
Read:Write ratioLow (< 10:1)High (> 100:1)
Data freshnessMust be real-timeOkay to be slightly stale
Query complexitySimple queriesComplex aggregations
Storage costExpensiveCheap

Default Starting Point

When unsure, start with real-time. It is simpler. Add pre-computation only when you need the speed. The interviewer wants to see you understand the tradeoff, not that you always pick the fancy option.

Step 5: Choose Your Database (3 minutes)

Simple Rule: Start with SQL

Use PostgreSQL (SQL) unless you have a specific reason not to.

Why? SQL databases: - Handle most use cases well - Give you transactions (ACID) - Let you change your queries later - Are well understood and battle-tested

When to Use NoSQL

Only use NoSQL when SQL cannot do the job:

ProblemSQL LimitationNoSQL Solution
Massive write throughputSingle leader bottleneckCassandra, DynamoDB
Flexible/changing schemaSchema changes are slowMongoDB, DynamoDB
Simple key-value lookupsSQL overhead unnecessaryRedis, DynamoDB
Graph relationshipsJoins get slow at depthNeo4j
Full-text searchLIKE queries are slowElasticsearch
Time-series dataNot optimized for time queriesTimescaleDB, InfluxDB
Database Selection Examples
TWITTER:
- Users table: PostgreSQL (relational, need ACID)
- Tweets table: PostgreSQL (need to query by time, user)
- Follower graph: PostgreSQL (simple join) or Graph DB (if complex queries)
- Feed cache: Redis (fast reads, okay to lose on crash)
- Tweet search: Elasticsearch (full-text search)

UBER:
- Users, Drivers, Trips: PostgreSQL (transactional)
- Driver locations: Redis (ephemeral, changes every second)
- Location history: Cassandra (high write volume, time-series)

E-COMMERCE:
- Products, Orders, Users: PostgreSQL (ACID for payments)
- Shopping cart: Redis (ephemeral, fast)
- Product search: Elasticsearch (full-text, facets)
- Product catalog: PostgreSQL + CDN cache

CHAT APPLICATION:
- Users, Conversations: PostgreSQL
- Messages: Cassandra (high write volume, time-ordered)
- Online status: Redis (ephemeral, changes often)
- Message search: Elasticsearch

Database Decision Tree

Do Not Use Multiple Databases Just to Look Smart

Every database you add is more complexity. In interviews, I see people add Kafka, Redis, Elasticsearch, Cassandra, and PostgreSQL to a simple app. Why? Start with one database. Add others only when you can explain exactly why.

What to Tell the Interviewer

Say something like:

For the main data (users, tweets, follows), I will use PostgreSQL. It handles our query patterns well and gives us transactions.

For the feed cache, I will use Redis. Feed reads need to be fast, and it is okay if we lose the cache on a crash - we can rebuild it.

If we need tweet search later, we would add Elasticsearch.

This shows you make decisions based on needs, not hype.

Step 6: Draw the Basic Architecture (10 minutes)

Start Simple, Add Complexity Only When Needed

Every system starts with the same basic building blocks:

  1. Load Balancer - Distributes traffic across servers
  1. App Servers - Run your code (stateless)
  1. Database - Stores your data
  1. Cache - Speeds up reads

That is it for the basic architecture. Let us build it step by step.

Layer 1: The Simplest System

Start here. This handles thousands of users.

Layer 1: Basic Setup

What to explain: - Load balancer spreads traffic evenly - App servers are stateless (no data stored on them) - Any server can handle any request - Database is the single source of truth

Layer 2: Add Caching

When database reads become slow, add a cache.

Layer 2: Add Cache

What to explain: - Check cache first, then database - Cache stores hot data (frequently accessed) - Cache miss: data not in cache, get from DB, store in cache - Cache hit: return data directly, very fast

Layer 3: Add Background Workers

When some work is slow or can be done later, use async processing.

Layer 3: Add Async Processing

What to explain: - Message queue holds work to be done later - Workers process jobs from the queue - Use for: sending emails, updating feeds, processing images - User gets fast response, work happens in background

Example: Full Twitter Architecture
COMPONENTS:

1. API GATEWAY
   - Handles authentication (is this user logged in?)
   - Rate limiting (is this user sending too many requests?)
   - Routes requests to the right service

2. TWEET SERVICE
   - POST /tweets endpoint
   - Writes tweet to database
   - Sends message to queue: "new tweet posted"

3. FEED SERVICE  
   - GET /feed endpoint
   - Reads pre-built feed from Redis cache
   - If cache miss, builds feed from database

4. FANOUT SERVICE (Background Worker)
   - Listens for "new tweet posted" messages
   - For each follower, adds tweet to their feed in cache

5. USER SERVICE
   - Follow/unfollow endpoints
   - Updates the social graph

6. DATA STORES:
   - PostgreSQL: Users, Tweets, Followers (source of truth)
   - Redis: Feed cache (user_id -> list of tweet_ids)
   - S3: Images and videos (blob storage)
   - CDN: Serves images and videos fast, globally

Twitter Full Architecture

Keep It Simple

Do not draw 15 boxes to look impressive. It backfires. It shows you cannot simplify. Aim for 5-8 boxes maximum. Add more only if you can explain why each one is needed.

Step 7: Handle Scale (10 minutes)

Scale is About Finding Bottlenecks

Adding more servers does not automatically mean better scale. You need to find what breaks first and fix that.

Typical bottleneck order:

  1. Database reads - Too many queries hitting the database
  1. Database writes - Too many writes for single database
  1. Hot partitions - One piece of data gets too much traffic
  1. Network - Too much data moving around

Let us solve each one.

Solution 1: Caching (For Read Bottlenecks)

Caching is your first tool for read-heavy systems.

Cache StrategyHow It WorksWhen to Use
Cache-AsideCheck cache, if miss read from DB, write to cacheMost common, works for most cases
Write-ThroughWrite to cache and DB at same timeWhen you need cache always in sync
Write-BehindWrite to cache, later write to DBWhen DB writes are slow, okay to lose data
Cache-Aside Pattern (Most Common)
def get_user(user_id):
    # Step 1: Check cache first
    user = cache.get(f"user:{user_id}")
    
    if user is not None:
        return user  # Cache hit! Fast path
    
    # Step 2: Cache miss - get from database
    user = database.query("SELECT * FROM users WHERE id = ?", user_id)
    
    # Step 3: Store in cache for next time (with expiry)
    cache.set(f"user:{user_id}", user, expire=3600)  # 1 hour
    
    return user

Solution 2: Database Read Replicas

When cache is not enough, add read replicas.

Read Replicas

What to explain: - One primary database handles all writes - Multiple replicas handle reads - Replicas copy data from primary (slight delay) - Good for read-heavy systems (which most are)

Solution 3: Sharding (For Write Bottlenecks)

When one database cannot handle all writes, split the data.

Database Sharding

Sharding Strategies
STRATEGY 1: HASH-BASED SHARDING
- shard = hash(user_id) % number_of_shards
- Pros: Even distribution
- Cons: Hard to add shards later, cross-shard queries hard

STRATEGY 2: RANGE-BASED SHARDING
- Shard 1: user_id 1-1,000,000
- Shard 2: user_id 1,000,001-2,000,000
- Pros: Easy range queries, easy to add shards
- Cons: Can get uneven (what if shard 1 has all active users?)

STRATEGY 3: DIRECTORY-BASED SHARDING
- Keep a lookup table: user_id -> shard
- Pros: Complete control, can move users between shards
- Cons: Lookup table is single point of failure

WHICH TO CHOOSE:
- Start with hash-based (simplest)
- Use consistent hashing to make adding shards easier
- Only use range if you need range queries

Solution 4: Handling Hot Partitions (The Celebrity Problem)

Some data gets WAY more traffic than others. This is called a hot partition.

Examples: - Tweet from celebrity with 50 million followers - Product on sale that everyone wants - Viral video that everyone watches

Solutions:

ProblemSolutionHow It Works
Celebrity posts tweetSeparate celebrity handlingDo not fan-out for celebrities, pull their tweets on read
Hot productCache replicationReplicate hot item across multiple cache servers
Viral contentCDN + edge cachingPush content to CDN edges worldwide
Hot database rowRead replicas + cacheCache the hot row, spread reads across replicas
Write hot spotAppend-only + batchWrite to log first, batch update later
Handling the Celebrity Problem in Twitter
PROBLEM:
Celebrity with 50 million followers posts a tweet.
If we fan-out on write, that is 50 million cache updates.
This will take forever and overwhelm our workers.

SOLUTION: HYBRID APPROACH

1. Classify users:
   - Normal user: < 10,000 followers
   - Celebrity: >= 10,000 followers

2. For normal users (fan-out on write):
   - When they post, add tweet to all followers feeds
   - Works fine for 10,000 updates

3. For celebrities (fan-out on read):
   - When they post, store tweet but do NOT fan out
   - Mark their account as "celebrity"

4. When user loads feed:
   - Get pre-built feed from cache (tweets from normal users)
   - Find celebrities this user follows
   - Fetch recent tweets from those celebrities
   - Merge and sort
   - Return to user

5. Optimization:
   - Cache celebrity tweets aggressively (everyone reads them)
   - Only fetch celebrities when user scrolls past pre-built feed

The 80/20 Rule of Scale

80% of traffic usually goes to 20% of data. Find that 20% and cache it aggressively. Do not try to optimize everything - optimize what matters most.

Step 8: Add Reliability (5 minutes)

Everything Fails

Servers crash. Networks break. Databases go down. Disks fail.

Your system must keep working when things break.

The Key Questions:

  1. What happens when server X crashes?
  1. How do we avoid losing data?
  1. How do we recover from failures?
  1. How do we avoid double-processing?

Reliability Technique 1: Replication

Keep multiple copies of everything important.

Data Replication

What to explain: - Data is copied to multiple machines - If one machine dies, others have the data - Trade-off: More replicas = more durability but more cost - Typically: 3 replicas for important data

Reliability Technique 2: Retries with Backoff

When something fails, try again, but wait longer each time.

Exponential Backoff Pattern
def call_external_service(request):
    max_retries = 3
    base_delay = 1  # second
    
    for attempt in range(max_retries):
        try:
            return service.call(request)
        except TemporaryError:
            if attempt == max_retries - 1:
                raise  # Give up after max retries
            
            # Wait longer each time: 1s, 2s, 4s
            delay = base_delay * (2 ** attempt)
            # Add some randomness to avoid thundering herd
            delay = delay + random(0, delay * 0.1)
            sleep(delay)

Reliability Technique 3: Idempotency (Handle Duplicates)

If a request happens twice, the result should be the same.

Idempotency Pattern
PROBLEM:
User clicks "Pay" button twice (network was slow).
Without idempotency: User is charged twice!

SOLUTION:
Each request has a unique idempotency key.

def process_payment(idempotency_key, amount):
    # Check if we already processed this
    existing = database.get("payment:" + idempotency_key)
    
    if existing:
        return existing.result  # Return same result as before
    
    # Process the payment
    result = payment_provider.charge(amount)
    
    # Store the result with the key
    database.store("payment:" + idempotency_key, result)
    
    return result

Now if user clicks twice with same idempotency_key:
- First click: Processes payment, stores result
- Second click: Returns stored result, no double charge

Reliability Technique 4: Circuit Breaker

If a service keeps failing, stop calling it for a while.

Circuit Breaker Pattern

What to explain: - CLOSED: Normal operation, requests go through - When too many failures: Switch to OPEN - OPEN: Requests fail immediately (do not even try) - After timeout: Switch to HALF-OPEN, try one request - If it works: Back to CLOSED. If fails: Back to OPEN

This prevents cascading failures.

Reliability Technique 5: Graceful Degradation

When something breaks, keep working with reduced functionality.

Graceful Degradation Examples
NETFLIX EXAMPLE:
- Normal: Personalized recommendations for you
- Recommendation service down: Show popular movies instead
- Search service down: Show categories to browse
- Everything down: Show cached homepage

TWITTER EXAMPLE:
- Normal: Real-time feed with all features
- Feed service slow: Show cached version of feed
- Image service down: Show tweets without images
- Everything down: Show cached tweets, disable posting

PAYMENT EXAMPLE:
- Primary payment provider down: Try backup provider
- All providers down: Queue payment for later, tell user
- Never: Silently fail and lose the payment

Never Lose Money or Important Data

For payments, orders, and critical data: Always write to durable storage before telling user success. Use database transactions. Have backups. Test your recovery process. Losing user data or money is unacceptable.

Bonus: Operations (Quick Overview)

You Might Get Asked About These

Some interviewers ask about how you would operate this system. Here is a quick overview of key topics.

Monitoring: Know When Things Break

What to MonitorWhyExample Metrics
Request rateKnow your trafficRequests per second
Error rateKnow when things break5xx errors per minute
LatencyKnow when things are slowp50, p95, p99 response time
Resource usageKnow when to scaleCPU, memory, disk usage
Business metricsKnow if product worksSign-ups, purchases, active users

Alerting: Get Notified When Things Break

  • Alert on symptoms (error rate high) not causes (CPU high)
  • Have clear runbooks: When alert X fires, do Y
  • Avoid alert fatigue: Too many alerts = people ignore them

Deployment: How to Ship Code Safely

StrategyHow It WorksWhen to Use
Rolling deployUpdate servers one by oneDefault choice, low risk
Blue-greenRun two environments, switch trafficWhen you need instant rollback
CanarySend 1% traffic to new version firstWhen change is risky
Feature flagsDeploy code but toggle feature on/offWhen you want control over rollout

Security Basics (Often Forgotten)

  1. Authentication: Who is this user? (passwords, OAuth, tokens)
  1. Authorization: Can this user do this action? (roles, permissions)
  1. Rate Limiting: Is this user sending too many requests?
  1. Input Validation: Is this input safe? (prevent SQL injection, XSS)
  1. Encryption: Data in transit (HTTPS), data at rest (encrypted disks)
  1. Secrets Management: Never put passwords in code (use vault)

Rate Limiting is Almost Always Asked

If you have time, mention rate limiting. Say: We would add rate limiting at the API gateway to prevent abuse. We could use a token bucket algorithm - each user gets N tokens per minute. This protects our system from bad actors and runaway scripts.

The Complete Checklist

Use This in Every Interview

Print this checklist. Practice until it becomes automatic.

The 8-Step Interview Checklist
□ STEP 1: CLARIFY REQUIREMENTS (5 min)
  □ What are the 2-3 core features? (MVP mindset)
  □ Who are the users? How many?
  □ What scale are we designing for?
  □ Any special requirements? (real-time, reliability)
  □ Summarize before moving on

□ STEP 2: DESIGN THE API (3 min)
  □ List core endpoints (3-5 max)
  □ For each: method, path, input, output
  □ Which endpoint is the hot path?
  □ Is it read-heavy or write-heavy?

□ STEP 3: QUICK MATH (3 min)
  □ Requests per second
  □ Storage needed
  □ Read:Write ratio
  □ Peak vs average
  □ What do these numbers tell us?

□ STEP 4: COMPUTE MODEL (3 min)
  □ For each API: real-time or pre-compute?
  □ Explain the tradeoff
  □ Make a decision and justify

□ STEP 5: DATABASE CHOICE (3 min)
  □ What databases do we need?
  □ Start with SQL, justify any NoSQL
  □ Explain what goes where

□ STEP 6: DRAW ARCHITECTURE (10 min)
  □ Start simple: LB → App → DB
  □ Add cache if read-heavy
  □ Add queue if need async processing
  □ Keep it to 5-8 boxes
  □ Explain each component

□ STEP 7: HANDLE SCALE (10 min)
  □ Identify the bottleneck
  □ Caching strategy
  □ Sharding if needed
  □ Hot partition handling

□ STEP 8: ADD RELIABILITY (5 min)
  □ What happens when X fails?
  □ Replication for durability
  □ Retries with backoff
  □ Idempotency for duplicates

□ WRAP UP (3 min)
  □ Summarize what you built
  □ State the main tradeoffs
  □ Suggest future improvements
  □ Ask: Any questions or areas to explore?

Framework Visual Summary

Common Mistakes to Avoid

Learn From Others Mistakes

These are the most common ways people fail system design interviews.

MistakeWhy It Is BadWhat to Do Instead
Jumping into drawing boxesYou might solve the wrong problemAlways start with requirements
Not doing any mathYour design might not work at scaleDo quick estimation, 3 minutes is enough
Using too many technologiesShows you cannot simplifyStart with few components, add only when needed
Saying just use Kafka for everythingShows you do not understand tradeoffsExplain WHY you need each component
Not talking about tradeoffsSeems like you do not understand depthEvery decision has pros and cons, state them
Designing for Google scale when not neededOver-engineeringMatch complexity to stated requirements
Going silent while thinkingInterviewer cannot help youThink out loud, explain your reasoning
Not asking questionsSeems like you do not think criticallyAsk clarifying questions, check in with interviewer

The Best Tip

Treat the interview like a conversation, not a test. You and the interviewer are designing a system together. Ask for their input: What do you think about this approach? Should I go deeper here? This makes it collaborative and less stressful.

Practice Problems

Practice Makes Perfect

Use this framework on these problems. Time yourself - 45 minutes each.

DifficultyProblemsKey Challenge
BeginnerURL Shortener, PastebinID generation, caching
BeginnerRate LimiterDistributed counting
MediumTwitter, InstagramFeed generation, fan-out
MediumUber, LyftReal-time matching, location
MediumTicketmasterHandling flash sales, inventory
HardGoogle DocsReal-time collaboration, conflict resolution
HardWhatsApp, MessengerMessage delivery, ordering
HardYouTube, NetflixVideo processing, CDN
Example: Apply Framework to Unknown Problem
PROBLEM: Design a Food Delivery System (like DoorDash)

STEP 1: REQUIREMENTS
- Core: Browse restaurants, place order, track delivery
- Users: Customers, restaurants, drivers
- Scale: 1 million orders per day

STEP 2: API
- GET /restaurants?lat=X&lng=Y (browse nearby)
- POST /orders (place order)
- GET /orders/{id} (track status)
- PUT /drivers/{id}/location (driver updates position)

STEP 3: MATH
- 1M orders/day = ~10 orders/second
- Driver location updates: 50K drivers × every 5 seconds = 10K/second
- Read-heavy for browsing, write-heavy for locations

STEP 4: COMPUTE MODEL
- Restaurant list: Pre-compute (does not change often)
- Driver location: Real-time (changes constantly)
- Order matching: Real-time (need to match quickly)

STEP 5: DATABASE
- PostgreSQL: Users, Restaurants, Orders (need transactions)
- Redis: Driver locations (ephemeral, fast updates)
- Elasticsearch: Restaurant search (full-text, geo)

STEP 6: ARCHITECTURE
- API Gateway → Restaurant Service, Order Service, Driver Service
- Redis for real-time driver locations
- Message queue for order → driver matching

STEP 7: SCALE
- Cache restaurant data (does not change often)
- Shard orders by city/region
- Hot spot: Popular restaurants - cache their data

STEP 8: RELIABILITY
- Orders must not be lost (database + queue)
- Driver location can be approximate (no replication needed)
- Payment must be idempotent (no double charges)

Most Important Practice Tip

Practice out loud. Explaining your thoughts verbally is a different skill than thinking them. Record yourself and listen back. You will be surprised how often you ramble or skip important points. Practice until you can explain clearly in 45 minutes.

Trade-offs

AspectAdvantageDisadvantage