System Design Masterclass

58 items

System Design Masterclass

Infrastructuresearch-engineinverted-indexpagerankdistributed-systemscachingadvanced

Design Web Search Engine

Design a search engine that can index billions of pages and return results in milliseconds

Billions of pages, millions of searches per second|Similar to Google, Microsoft, Amazon, Apple, DuckDuckGo, Elasticsearch, Algolia|45 min read

Summary

A search engine helps you find information on the internet. When you type "best pizza near me", it searches through billions of web pages and shows you the most useful results in less than half a second. The tricky parts are: (1) building an index so we can find pages with certain words very fast, (2) deciding which pages are the best to show first (ranking), (3) understanding what you really mean when you search (query understanding), and (4) doing all this for millions of people searching at the same time. Companies like Google, Microsoft (Bing), and Amazon ask this question in interviews.

Key Takeaways

Core Problem

The main job is to take a few words from the user and find the 10 best pages out of billions. You need to do this in less than half a second while millions of other people are also searching.

The Hard Part

Searching through billions of pages one by one would take years. Instead, we build an inverted index - a giant list that says for every word, which pages contain that word. Finding pages is now instant because we just look up the words.

Scaling Axis

We split the index across thousands of machines. Each machine holds part of the index. When you search, we ask all machines at once and combine their answers. More machines = faster searches and more pages we can index.

Critical Invariant

Search results must always be available. If users cannot search, the product is useless. We keep multiple copies of everything and can survive entire data centers going down.

Performance Requirement

Users expect results in under half a second. Google aims for under 200 milliseconds. This means every piece of the system must be incredibly fast, and we cache everything we can.

Key Tradeoff

Freshness vs Quality: Newer pages might be more relevant (news!), but we have not had time to learn how good they are. Older pages have more trust signals. We balance both.

Design Walkthrough

Problem Statement

The Question: Design a web search engine like Google that can search through billions of web pages and return the best results in less than half a second.

What the search engine needs to do (most important first):

1.Index web pages - Take billions of web pages and organize them so we can search fast. This is like making a book index.
2.Return relevant results - When someone searches for "how to make pasta", show pages that actually teach how to make pasta, not random pages that mention the word.
3.Be fast - Return results in under 500 milliseconds (half a second). Users leave if search is slow.
4.Handle lots of users - Support millions of people searching at the same time.
5.Understand the question - When someone types "apple", figure out if they mean the fruit or the company.
6.Fix mistakes - If someone types "recipies" instead of "recipes", show results for the correct spelling.
7.Autocomplete - As someone types, suggest what they might be looking for.

What to say first

Let me first understand what we are building. Are we building a general web search engine like Google, or a specialized search for one website? How many pages do we need to index? How many searches per second should we handle? Once I know this, I can design the right system.

What the interviewer really wants to see: - Do you know how inverted indexes work? (This is the key to fast search) - Can you rank billions of pages to find the best ones? - How do you make searches fast enough? (Under 500ms for billions of pages) - How do you handle millions of people searching at once?

Clarifying Questions

Before you start designing, ask questions to understand what you are building. Good questions show the interviewer you think before you code.

Question 1: How big is this?

How many web pages do we need to index? How many searches per second should we support?

Why ask this: A search engine for 1 million pages is very different from one for 100 billion pages.

What interviewers usually say: Assume we need to index 50 billion web pages and handle 100,000 searches per second.

How this changes your design: At this scale, we need thousands of machines. The index will not fit on one computer. We need to split it across many machines.

Question 2: What type of search?

Are we doing text search only? Do we need image search, video search, or local business search?

Why ask this: Image and video search need completely different systems.

What interviewers usually say: Focus on text search for web pages. Image search can be a follow-up.

How this changes your design: We focus on text indexing and ranking. Images and videos would need separate pipelines.

Question 3: How fresh should results be?

If a breaking news article is published, how quickly should it appear in search results?

Why ask this: Real-time indexing is much harder than batch indexing.

What interviewers usually say: Important news should appear within minutes. Regular pages can take hours or days.

How this changes your design: We need a fast path for urgent content and a slow path for regular content. Two different indexing pipelines.

Question 4: Do we need personalization?

Should different users see different results for the same search? Like showing local restaurants to someone in New York vs London?

Why ask this: Personalization adds complexity and raises privacy concerns.

What interviewers usually say: Yes, basic personalization like location and language. Skip complex personalization based on search history for now.

How this changes your design: We need to know user location and language. Results are adjusted based on these factors.

Summarize your assumptions

Let me summarize: We are building a web search engine for 50 billion pages, handling 100,000 searches per second. Text search only for now. Breaking news should appear within minutes. We will do basic personalization by location and language. Our goal is under 500ms response time.

The Hard Part

Say this to the interviewer

The hardest part of search is not finding pages that match your words - it is finding the BEST pages out of millions that match. If I search for "pizza", millions of pages mention pizza. How do I know which 10 to show first? This is the ranking problem.

Three hard problems we need to solve:

Problem 1: Searching Billions of Pages Fast (Inverted Index)

Imagine you have a library with 50 billion books. Someone asks: "Find all books that mention the word elephant." You cannot check every book - that would take years!

The trick is to build an index BEFORE anyone searches. We make a list that says: - Word "elephant" appears in: Book 5, Book 89, Book 1234... - Word "pizza" appears in: Book 12, Book 456, Book 7890...

Now when someone searches, we just look up the word and instantly know which books have it. This list is called an inverted index (inverted because we flip from "book has words" to "word has books").

How an Inverted Index works

Problem 2: Ranking - Which Pages Are Best?

When you search "how to make pasta", maybe 10 million pages match. Which 10 do we show first?

Google solved this with PageRank. The idea is simple: If many good websites link to a page, that page is probably good too. It is like voting - each link is a vote. A vote from CNN is worth more than a vote from some random blog.

But PageRank is just one signal. Modern search uses hundreds of signals: - How many times does the search word appear on the page? - Is the word in the title? (more important) - How old is the page? (newer might be better for news) - Do people click on this result and stay on it? (good sign) - Is the page fast to load? - Is the page safe? (no viruses)

Common mistake candidates make

Many people focus only on finding matching pages and forget about ranking. Finding pages is easy - any database can do that. The real challenge is deciding which pages are BEST. This is where Google beats other search engines.

Problem 3: Speed - 500ms for Everything

500 milliseconds sounds like a lot, but here is what we need to do in that time:

1.User types query and sends to server (50ms for network) 2. Parse the query and understand it (10ms) 3. Search the index - but it is on 1000+ machines! (100ms) 4. Get results from all machines and combine them (50ms) 5. Rank the combined results (50ms) 6. Get snippets to show under each result (50ms) 7. Send results back to user (50ms)

We only have about 100ms left for any problems or slow machines. Everything must be incredibly optimized.

Scale and Access Patterns

Before designing, let me figure out how big this system needs to be. This helps us choose the right tools.

What we are measuring	Number	What this means for our design
Web pages to index	50 billion	Index is too big for one machine - need to split across thousands
Average page size	50 KB text	Raw text = 2.5 PB (petabytes). Compressed index about 500 TB

+ 6 more rows...

What to tell the interviewer

At 50 billion pages and 100,000 searches per second, we need a distributed system. The index is split across thousands of machines. Each search goes to all machines in parallel. We use heavy caching because many people search for the same things.

How people use search (access patterns):

1. Short, common queries (80% of searches) - "weather", "facebook", "amazon" - Same query searched thousands of times per second - Perfect for caching - store the answer and reuse it

2. Long, unique queries (15% of searches) - "why does my cat stare at me at 3am" - Each one is different, hard to cache - Must actually search the index

3. Trending queries (5% of searches) - When something big happens, millions search for it - "super bowl score", "election results" - Starts rare, suddenly becomes very common

How big is the index?
- 50 billion pages
- Average 500 unique words per page

+ 21 more lines...

High-Level Architecture

Now let me draw the big picture of how all the pieces fit together. I will keep it simple and explain what each part does.

What to tell the interviewer

I will break this into two main parts: the Indexing Pipeline that builds the index from web pages, and the Query Pipeline that handles user searches. They work separately - indexing runs all the time in the background, while query handling must be super fast.

Search Engine - The Big Picture

What each part does and WHY it is separate:

Part	What it does	Why it is separate (what to tell interviewer)
Load Balancer	Sends user requests to available query servers	Spreads the load so no single server gets overwhelmed. If one server dies, others keep working.
Query Servers	Handle the search request from start to finish	Stateless - any server can handle any request. Easy to add more servers when we get more users.

+ 6 more rows...

Common interview question: Why split the index by document?

Two ways to split: by document (pages 1-1M on shard 1, pages 1M-2M on shard 2) or by word (words A-M on shard 1, N-Z on shard 2). We split by document because: (1) Each shard can score documents locally, (2) If one shard dies, we lose some pages but search still works, (3) Adding new pages is easy - just add to any shard.

Technology Choices - Why we picked these tools:

Index Storage: Custom or Lucene - Lucene is the most popular search library. Elasticsearch and Solr use it. - Google and Bing use custom systems because Lucene is not fast enough at their scale. - For most companies, Elasticsearch (built on Lucene) works great.

Cache: Redis or Memcached - Super fast - stores results in memory - Redis is more feature-rich, Memcached is simpler - Cache hit = skip the entire index search = instant response

Document Store: Bigtable, HBase, or S3 - Stores the actual page content for generating snippets - Needs to be fast to read but can be slow to write - Often we store just the important parts, not full HTML

Message Queue: Kafka - Connects crawler to indexer - Handles spikes in crawling without overwhelming the indexer

Important interview tip

Pick technologies YOU know! If you have used Elasticsearch at your job, use that. Interviewers care more about your reasoning than the specific tool. Say: I would use Elasticsearch because I have experience with it, but a custom solution might be needed at Google scale.

The Inverted Index Deep Dive

The inverted index is the heart of any search engine. Let me explain step by step how we build it and use it.

What to tell the interviewer

An inverted index is like the index at the back of a textbook. The textbook index says: elephant → pages 23, 45, 89. Our index says: pizza → document IDs 12, 456, 7890. For each word, we know exactly which pages contain it.

Building the inverted index - Step by Step:

Step 1: Get the page from crawler

URL: https://example.com/recipe
Title: Best Pizza Recipe
Content: This is the best pizza recipe. Make pizza at home.

Step 2: Clean and tokenize (break into words)

Remove HTML tags, scripts, styles
Lowercase everything
Split into words: [best, pizza, recipe, this, is, the, make, at, home]

Step 3: Remove stop words (common words that do not help)

Remove: this, is, the, at
Keep: [best, pizza, recipe, make, home]

Step 4: Stem words (reduce to root form)

recipes → recipe
making → make
Result: [best, pizza, recipe, make, home]

Step 5: Record position and importance

For each word, record:
- Which document has it (doc ID 12)
- Where in the document (position 1, 2, 3...)
- Is it in title? (extra importance)
- How many times does it appear? (frequency)

Structure of an Inverted Index Entry

How we search the index:

When someone searches for "best pizza recipe":

1.Look up "best" in the index → get list of documents 2. Look up "pizza" in the index → get list of documents 3. Look up "recipe" in the index → get list of documents 4. Find documents that appear in ALL three lists (intersection) 5. Score each document based on where and how often words appear 6. Return the top 10 by score

FUNCTION search(query):
    // Example: query = "best pizza recipe"

+ 51 more lines...

What is TF-IDF?

TF-IDF stands for Term Frequency - Inverse Document Frequency. It says: A word is important to a document if it appears OFTEN in that document (TF) but RARELY in other documents (IDF). The word THE appears everywhere, so it has low IDF. The word KUBERNETES appears rarely, so it has high IDF. A page that says KUBERNETES 10 times is probably about Kubernetes!

Distributing the index across machines:

The index is too big for one machine. We split it by document:

Shard 1: Documents 1 to 50 million - Shard 2: Documents 50M to 100 million - Shard 3: Documents 100M to 150 million - ... and so on for 1000 shards

Each shard has its OWN inverted index for just its documents.

When searching: 1. Query goes to ALL shards at the same time 2. Each shard returns its top 10 results 3. We merge all results and pick the overall top 10

Important: Why not shard by word?

We could split by word: Shard 1 has words A-M, Shard 2 has words N-Z. But this is bad because: (1) A search for pizza recipe must hit two shards and wait for both, (2) We cannot score documents on one shard because we need ALL words to score, (3) Some words like THE would overload one shard. Sharding by document is better!

Ranking - Finding the Best Results

Finding pages that match is easy. Finding the BEST pages is the hard part. This is what makes Google better than other search engines.

What to tell the interviewer

We use hundreds of signals to rank pages. The most important ones are: relevance (does the page match the query?), authority (is this a trusted page?), freshness (is the page recent?), and user signals (do people click and stay on this page?). I will explain each one.

The ranking signals:

1. Relevance Signals - Does this page match what the user wants? - Does the query word appear in the title? - Does the query word appear in headings (H1, H2)? - How many times does the query word appear? - Do query words appear close together ("pizza recipe" vs "pizza... recipe" pages apart)? - Does the URL contain the query word?

2. Authority Signals - Is this a good, trustworthy page? - PageRank: How many important pages link to this page? - Domain authority: Is this site generally trusted (nytimes.com vs randomsite123.com)? - Age: Has this site been around a long time? - Spam signals: Does this page look like it is trying to trick search engines?

3. Freshness Signals - Is this page up to date? - When was the page last updated? - Is the content time-sensitive? (news vs history article) - How quickly is the page changing?

4. User Signals - Do users like this result? - Click-through rate: When we show this result, do people click it? - Dwell time: After clicking, do people stay on the page or come back immediately? (pogo-sticking is bad) - Do people search again after visiting? (they did not find what they wanted)

How PageRank works (simplified)

PageRank explained simply:

Imagine each web page starts with 1 point. Then:

1.Each page gives away its points equally to all pages it links to 2. If CNN (100 points) links to 10 pages, each gets 10 points 3. If a random blog (1 point) links to 10 pages, each gets 0.1 points 4. We repeat this process many times until scores stop changing

The result: Pages linked by many important pages get high scores. This is brilliant because: - You cannot fake it by creating millions of pages linking to yourself (those pages have no incoming links, so they have no points to give) - Important pages naturally rise to the top

FUNCTION calculate_final_score(document, query):
    
    // Start with text relevance (TF-IDF from the index)

+ 37 more lines...

The spam battle

Website owners want to rank #1. Some try to cheat: hide keywords in white text on white background, create fake links, copy content from other sites. We use machine learning to detect these tricks. When we catch someone cheating, we punish them severely - sometimes removing them from search entirely.

Query Processing and Understanding

Users type messy queries. They make typos. They use different words for the same thing. Before we search, we need to understand what they really want.

What to tell the interviewer

Query processing has three parts: query correction (fix typos), query expansion (add synonyms), and query understanding (figure out intent). A user typing appple recipies wants results for apple recipes, and they probably want cooking recipes, not chemistry formulas.

Step 1: Spell Correction

When someone types "recipies" instead of "recipes", we need to fix it.

How it works: 1. Check if the word exists in our dictionary (all words we have seen in web pages) 2. If not, find words that are "close" (just 1-2 letters different) 3. Pick the most likely correction based on what people usually mean

Tricks: - "recipies" is close to "recipes" (swap i and e) - But it is also close to "receipts" - how do we pick? - Look at what word is most common overall - Look at what other words are in the query ("apple recipies" → probably recipes, not receipts)

Spell correction pipeline

Step 2: Query Expansion (Add synonyms and related words)

If someone searches "car", we should also find pages about "automobile", "vehicle", "auto".

How it works: - Build a list of synonyms from a thesaurus or by analyzing which words appear in similar contexts - When user searches "car", internally we also search for automobile, vehicle - Weight the synonyms lower than the original word (user said "car", not "automobile")

Step 3: Query Understanding (What do they REALLY want?)

When someone searches "apple": - Do they want Apple the company? - Or apple the fruit? - Or how to grow apples?

We use: - Context from other words ("apple stock" = company, "apple pie" = fruit) - User location (someone in Seattle might mean the company) - Trending topics (if Apple just released a new iPhone, more people mean the company) - Personalization (if this user always searches tech stuff, probably means the company)

FUNCTION process_query(raw_query, user):
    
    STEP 1: Basic cleanup

+ 46 more lines...

Autocomplete - Suggesting as users type:

As users type "how to make", we suggest: - "how to make money" - "how to make pancakes" - "how to make slime"

How it works: 1. Store all previous queries in a special index (a trie or prefix tree) 2. When user types "how to m", find all queries starting with those letters 3. Rank by popularity (how many people searched this before) 4. Return top 10 suggestions

Speed is critical - suggestions must appear in under 100ms as user types each letter.

Fun fact: Autocomplete impact

Good autocomplete can reduce the amount users need to type by 25%. It also guides users toward queries that will give good results. If we suggest how to make pancakes, we know we have good results for that query!

Making Search Fast

The speed requirement

Google aims for under 200ms. At 500ms, users feel the delay. At 1 second, users start to leave. Every 100ms of delay costs 1% in revenue. Speed is not optional - it is critical!

Where does time go in a search?

Let me break down what happens in a typical search and how long each part takes:

1.Network: User to server - 50-100ms (depends on distance) 2. Query parsing and correction - 5-10ms 3. Cache lookup - 1ms (if hit, we are done!) 4. Send query to index shards - 5ms 5. Search each shard - 50-100ms (this is the slow part) 6. Merge results from all shards - 10ms 7. Fetch snippets - 20-50ms 8. Network: Server to user - 50-100ms

Total: 200-400ms if everything goes well.

Trick 1: Caching (the biggest win!)

Many people search for the same things: - "weather" - "facebook" - "youtube" - "amazon"

Why compute the same answer over and over? Store it in cache!

What we cache: - Full search results for popular queries - Posting lists for common words - Snippets for frequently shown pages - Autocomplete suggestions

Cache hit rate: - 80% of queries hit the cache - That means only 20% actually need to search the index - Cache response time: 5-10ms vs 200ms for full search

FUNCTION search_with_cache(query, user):
    
    // Generate a cache key

+ 29 more lines...

Trick 2: Search shards in parallel

We have 1000 index shards. If we searched them one by one: - 1000 shards × 50ms each = 50 seconds. Way too slow!

Instead, we search ALL shards at the SAME time: - Send query to all 1000 shards - Each shard searches its piece (50ms) - All searches happen at once (parallel) - Total time: 50ms, not 50 seconds!

This is why we need to split the index. Parallelism is the key to speed.

Parallel search across shards

Trick 3: Early termination

We do not need to find ALL matching documents - we only show 10!

If we know a document cannot make it into top 10, stop looking at it.

How it works: - Sort posting lists by expected score (highest first) - Keep track of the score of our 10th best result so far - If a document cannot possibly beat that score, skip it - Stop early once we have confident top 10

This can skip 90% of the work for common queries!

Trick 4: Keep hot data in memory

Reading from disk: 10ms Reading from memory: 0.1ms (100x faster!)

We keep the most important data in RAM: - The entire inverted index (compressed) - Posting lists for common words - Document scores and metadata

Disk is only for rarely accessed data.

The tail latency problem

If we search 1000 shards, we must wait for the SLOWEST one. Even if 999 shards respond in 50ms, one slow shard taking 500ms makes the whole search slow. Solutions: (1) Send query to backup shards too, use whoever responds first, (2) Set strict timeouts - if a shard does not respond in 100ms, skip it, (3) Monitor slow shards and fix or replace them.

Indexing Pipeline

We have talked about searching the index. Now let me explain how we BUILD the index from crawled web pages.

What to tell the interviewer

Indexing is a pipeline: we take crawled pages, parse them, extract text, build the inverted index, and calculate PageRank. This runs continuously in the background. We have two paths: a slow batch path that rebuilds the full index weekly, and a fast path for urgent content like news.

Indexing Pipeline

Step by step, what happens to each page:

1. Parse HTML - Remove scripts, styles, navigation, ads - Keep the main content (the article, not the sidebar) - Extract title, headings (H1, H2), bold text - Extract metadata (author, date, language)

2. Extract text - Get clean text from the HTML - Detect language (is this English? Spanish? Chinese?) - Handle different encodings (UTF-8, etc.)

3. Tokenize - Break text into words - Handle special cases: "New York" should be one token, not two - Handle numbers, dates, emails, URLs

4. Normalize - Lowercase everything - Remove accents (cafe = café) - Stem words (running → run) - Remove stop words (the, a, is)

5. Build index entry - For each word, record: document ID, positions, importance - Store in temporary files

6. Merge into main index - Combine new pages with existing index - Remove pages that no longer exist - Update scores

FUNCTION build_index(crawled_pages):
    // This runs continuously as new pages come in

+ 56 more lines...

The two indexing paths:

Slow path (batch processing): - Rebuild entire index from scratch - Runs weekly or monthly - Can do expensive computations (full PageRank) - Produces highest quality index - Takes hours to days

Fast path (real-time): - Add new pages immediately - For breaking news and trending content - Simpler scoring (no full PageRank update) - Pages appear in minutes - May have lower quality ranking until batch catches up

Google Caffeine

Google used to rebuild their index every few weeks. In 2010, they launched Caffeine, which indexes pages continuously. Now, breaking news can appear in search within minutes of being published. This was a huge engineering effort!

What Can Go Wrong and How We Handle It

Tell the interviewer about failures

Good engineers think about what can break. Let me walk through the things that can go wrong and how we protect against them.

What breaks	What happens to users	How we fix it	Why this works
One index shard dies	Missing some results (1/1000 of index)	Keep 3 copies of each shard. If one dies, use a copy.	Redundancy means we can lose machines without losing data.
Whole data center goes down	Searches in that region fail	Route traffic to another data center automatically.	We run in 3+ data centers worldwide. Any can handle full load.

+ 5 more rows...

High availability requirements:

Search must ALWAYS work. Here is how we achieve 99.99% uptime (only 52 minutes of downtime per year):

1. Redundancy at every level: - 3 copies of every index shard - Multiple query servers behind load balancer - Multiple cache servers - Multiple data centers

2. Automatic failover: - Health checks every second - Unhealthy servers removed immediately - Traffic automatically routes to healthy servers - No human needed for recovery

3. Graceful degradation: - If cache fails, use index (slower but works) - If some shards fail, return partial results - If spell correction fails, search original query - Something is always better than nothing

FUNCTION search_with_resilience(query, user):
    
    // Set overall timeout - never wait more than 500ms

+ 47 more lines...

Monitoring is critical

Set up dashboards to watch: query latency (p50, p99), cache hit rate, error rate per shard, queries per second, and trending queries. If any metric looks wrong, alert the on-call engineer immediately!

Growing the System Over Time

What to tell the interviewer

This design handles billions of pages. Let me explain how we would start small and grow, and what advanced features we could add later.

How we grow step by step:

Stage 1: Starting out (millions of pages) - Single Elasticsearch cluster - A few query servers behind load balancer - Redis for caching - Simple ranking (TF-IDF + basic PageRank) - This handles 1-10 million pages, 1000 queries/second

Stage 2: Medium scale (hundreds of millions of pages) - Sharded Elasticsearch or custom index - Query servers in 2-3 regions - Distributed cache - Better ranking with more signals - This handles 100 million pages, 10,000 queries/second

Stage 3: Google scale (billions of pages) - Custom everything (no off-the-shelf software) - Data centers on every continent - Machine learning for everything (ranking, spam, intent) - Real-time indexing - This handles 50+ billion pages, 100,000+ queries/second

Scaling stages

Advanced features we can add:

1. Knowledge Graph

When you search "Barack Obama", Google shows a box with his photo, birth date, wife, etc. This comes from a knowledge graph - a database of facts about entities.

Extract facts from Wikipedia and other sources - Link entities (Barack Obama → Michelle Obama → Sasha Obama) - Answer factual questions directly without clicking a result

2. Voice Search

Convert speech to text first - Understand natural language ("What is the weather like in San Francisco" vs "weather sf") - Return spoken answers for simple questions

Query: "How tall is the Eiffel Tower?"

Traditional search:

+ 24 more lines...

3. Personalization

Track what users search and click (with privacy controls) - Learn their interests over time - Show more relevant results based on history - Example: A programmer searching "python" probably wants the programming language, not the snake

4. Vertical Search

Specialized search for specific types of content - Image search: Understand what is IN pictures - Video search: Transcribe and search video content - Shopping search: Search products with prices and reviews - News search: Focus on recent, authoritative news sources - Local search: Find nearby businesses

Each vertical has its own ranking signals and UI.

Fun fact: Google handles 8.5 billion searches per day

That is about 100,000 searches per second, or 2 trillion searches per year. Google has been running for 25+ years and has built incredibly sophisticated systems. Do not feel bad if your design is simpler - Google has thousands of engineers working on search!

Design Trade-offs

Advantages

+Easy to set up and manage
+Built-in sharding and replication
+Good enough for millions of pages
+Large community and documentation

Disadvantages

-May not scale to billions of pages
-Limited control over ranking
-Can be expensive at scale

When to use

Use for MVP, internal search, or sites with up to 100 million pages. Many successful companies use Elasticsearch.

System Design Masterclass

Weather Application with Forecasting

URL Shortener

Live Comments Feature

API Rate Limiter

On-Call Escalation System

Hotel Booking and Reservation System

Parts Compatibility Validation

Real-time Stock Price Viewer

Top-K Rankings System

File Download and Sync Library

Real-time Active Viewers

Marketplace Features

Price Alert System

Netflix Screen Concurrency Limits

Live Reactions System

Top K Most Shared Articles

High-Profile Likes Counter

Authentication and User Login

Google Calendar

Web Crawler

News Feed

Video Streaming Platform

IoC / Dependency Injection Framework

Distributed Control Infrastructure

Notification Service

Distributed Tracing System

P2P File Transfer System

Large Data Migration to Cloud

Wire Transfer API

Large Data Sorting and Processing

Database Control Plane

Distributed Metrics Logging and Aggregation

Ads Management & Delivery System

Flash Sale Backend

Photo Sharing Platform

Cluster Health Monitoring System

Rider Matching System

Surge Pricing System

Collaborative Editing System

Server Metrics Collection System

User Analytics Dashboard & Event Pipeline

Dropbox / Google Drive

Distributed Message Queue

ETA and Live Location Sharing

Distributed Key-Value Store

Distributed Stream Processing System

Payment Processing System

Distributed Job Scheduler

WhatsApp / Messenger

Payment Wallet at Global Scale

Uber / Ride Sharing

Web Search Engine

Globally Distributed SQL Database

Real-Time Analytics System

Recommendation System (Netflix)

Multi-Region Disaster Recovery System

Time-Series Database

Fraud Detection System

Design Web Search Engine

Summary

Key Takeaways

Core Problem

The Hard Part

Scaling Axis

Critical Invariant

Performance Requirement

Key Tradeoff

Design Walkthrough

Problem Statement

What to say first

Clarifying Questions

Question 1: How big is this?

Question 2: What type of search?

Question 3: How fresh should results be?

Question 4: Do we need personalization?

Summarize your assumptions

The Hard Part

Say this to the interviewer

How an Inverted Index works