System Design Masterclass

58 items

System Design Masterclass

E-Commercemarketplacesearchgeolocationlistingsmessagingintermediate

Design Marketplace Features

Design marketplace features like Facebook Marketplace with millions of listings

100M+ listings, 50M+ DAU|Similar to Facebook, Craigslist, OfferUp, Letgo, Mercari|45 min read

Summary

Marketplace systems connect buyers and sellers with local inventory. The core challenges are geo-based search at scale, real-time inventory management, trust and safety, and building a two-sided network. This problem tests your ability to handle location-based queries, search ranking, and user-generated content moderation.

Key Takeaways

Core Problem

This is fundamentally a geo-spatial search and discovery problem. Users want to find relevant items near them quickly.

The Hard Part

Combining location proximity, relevance ranking, and inventory freshness in sub-100ms queries across 100M+ listings.

Scaling Axis

Scale by geographic region. Each region can operate semi-independently since most transactions are local.

The Question: Design a marketplace like Facebook Marketplace where users can list items for sale and discover items to buy in their local area.

Core Features: - Listing creation: Sellers post items with photos, description, price, location - Discovery feed: Buyers browse items near them, filtered by category - Search: Find specific items by keyword within a geographic radius - Messaging: Buyers and sellers communicate about items - Transaction flow: Mark items as sold, handle disputes

What to say first

Before I design, let me clarify the scale, the primary use case (local vs shipping), and what trust/safety requirements exist. These significantly shape the architecture.

Hidden requirements interviewers test: - Can you design efficient geo-spatial queries? - How do you handle the two-sided marketplace cold start? - What about fraud, scams, and prohibited items? - How do you rank and personalize results? - Can you handle real-time inventory updates?

Summary

Key Takeaways

Core Problem

This is fundamentally a geo-spatial search and discovery problem. Users want to find relevant items near them quickly.

The Hard Part

Combining location proximity, relevance ranking, and inventory freshness in sub-100ms queries across 100M+ listings.

Scaling Axis

Scale by geographic region. Each region can operate semi-independently since most transactions are local.

Critical Invariant

Never show sold items as available. Stale inventory destroys user trust and wastes buyer time.

Performance Requirement

Search results in under 200ms. Feed refresh in under 100ms. Users expect instant, app-like experience.

Key Tradeoff

We denormalize heavily for read performance. A listing exists in multiple indexes (geo, category, search) that must stay in sync.

Design Walkthrough

Problem Statement

The Question: Design a marketplace like Facebook Marketplace where users can list items for sale and discover items to buy in their local area.

What to say first

Before I design, let me clarify the scale, the primary use case (local vs shipping), and what trust/safety requirements exist. These significantly shape the architecture.

Clarifying Questions

Ask these questions to demonstrate product thinking alongside technical depth.

Question 1: Scale

How many listings? How many daily active users? What is the read-to-write ratio?

Why this matters: Determines indexing strategy and caching needs. Typical answer: 100M listings, 50M DAU, 100:1 read-to-write Architecture impact: Heavy read optimization, eventual consistency acceptable for listings

Question 2: Geography

Is this local-only (pickup) or does it support shipping? What is the typical search radius?

Why this matters: Local-only means geo-partitioning is viable. Typical answer: Primarily local with 50-mile default radius, optional shipping Architecture impact: Can shard by geography, most queries are geo-bounded

Question 3: Categories

How many categories? Are there category-specific attributes (e.g., mileage for cars)?

Why this matters: Affects schema design and search filtering. Typical answer: 20-30 top-level categories, some with custom attributes Architecture impact: Need flexible schema (JSON attributes) plus category-specific indexes

Question 4: Trust and Safety

What moderation is needed? Are there prohibited items? How do we handle scams?

Why this matters: T&S is a major engineering investment. Typical answer: Need image moderation, text filtering, user reputation, fraud detection Architecture impact: Async processing pipeline, ML models, human review queue

Stating assumptions

I will assume: 100M listings, 50M DAU, primarily local marketplace with 50-mile radius, 20+ categories, need basic trust and safety. Users discover via feed and search.

The Hard Part

Say this out loud

The hard part here is combining geo-proximity, relevance ranking, and real-time inventory freshness in sub-200ms queries across 100 million listings.

Why this is genuinely hard:

1.Geo-spatial queries are expensive: Finding all items within 50 miles of a point requires spatial indexing. Naive approach scans entire database.
2.Multi-dimensional ranking: Results must balance distance, relevance, recency, seller reputation, and user preferences. This is not a simple sort.
3.Inventory freshness: When an item sells, it must disappear from all search results immediately. But we have 100M items in multiple indexes.
4.Cold start problem: New users have no history. New listings have no engagement. How do you rank them?
5.Trust at scale: With millions of listings, manual review is impossible. Automated systems miss edge cases.

Common mistake

Candidates often design search without considering geo. A marketplace search is fundamentally different from e-commerce search because location is a first-class filter.

The fundamental tradeoffs:

Freshness vs Performance: Real-time index updates are expensive. How stale can results be? - Relevance vs Fairness: Should new listings get boosted? How do small sellers compete? - Safety vs Friction: More verification means fewer listings. Where is the balance?

Scale and Access Patterns

Let me estimate the scale and understand how users interact with the system.

Dimension	Value	Impact
Total Listings	100 million	Need distributed storage and indexing
Daily Active Users	50 million	Heavy read load, caching critical

+ 6 more rows...

What to say

At 500M searches per day, that is about 6,000 QPS average, maybe 20,000 QPS peak. Each search hits geo-index plus relevance ranking. This is achievable with Elasticsearch cluster.

Access Pattern Analysis:

Feed browsing: 70% of traffic. Users scroll through nearby items. Highly cacheable by geo-region. - Search: 20% of traffic. Keyword + location + filters. Less cacheable due to query variety. - Listing view: 8% of traffic. Single item detail page. Cacheable by listing ID. - Listing creation: 2% of traffic. Write-heavy, triggers indexing pipeline.

Storage:
- 100M listings x 5KB = 500GB metadata (fits in memory)
- 100M listings x 5 photos x 500KB = 250TB photos (object storage + CDN)

+ 10 more lines...

High-Level Architecture

Let me walk through the architecture, separating read and write paths.

What to say

I will separate the system into: listing management, search and discovery, messaging, and trust and safety. Each can scale independently.

Marketplace Architecture Overview

Component Responsibilities:

1. Listing Service - CRUD operations for listings - Photo upload to S3 - Publishes events to Kafka for async processing

2. Search Service - Geo-bounded keyword search - Category filtering - Calls Ranking Service for result ordering

3. Indexer (Async) - Consumes listing events from Kafka - Updates Elasticsearch and Geo indexes - Handles deletions when items sell

4. Trust and Safety (Async) - Image moderation (ML models) - Text classification for prohibited content - Fraud signal detection

5. Messaging Service - Real-time chat between buyer/seller - Conversation threads per listing - Push notifications

Real-world reference

Facebook Marketplace uses a similar architecture with TAO for social graph, Elasticsearch for search, and a dedicated Trust and Safety ML platform that processes every listing.

Data Model and Storage

Let me define the core data models and storage choices.

What to say

PostgreSQL is the source of truth for listings. Elasticsearch provides search. Redis caches hot data. We denormalize into search indexes for query performance.

-- Users table
CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

+ 70 more lines...

Elasticsearch Document Structure:

We denormalize listing data into Elasticsearch for fast geo-search.

{
  "id": "listing-uuid",
  "title": "iPhone 14 Pro Max 256GB",

+ 39 more lines...

{
  "mappings": {
    "properties": {

+ 16 more lines...

Important detail

The Elasticsearch index is eventually consistent with PostgreSQL. When a listing sells, we must update both the database AND invalidate/update the search index. Use Kafka to ensure this happens reliably.

Geo-Search Deep Dive

Geo-search is the core of marketplace discovery. Let me explain the approaches.

Approach	How It Works	Pros	Cons
Bounding Box	Filter by lat/lng ranges	Simple, fast	Not circular, misses corners
Haversine Distance	Calculate actual distance for each point	Accurate	O(n) - does not scale
Geohash	Encode location as string prefix	Efficient prefix queries	Edge cases at boundaries
Geo-point (ES)	Native geo_distance query	Optimized, accurate	Requires Elasticsearch
H3/S2 Cells	Hierarchical spatial index	Very efficient	More complex implementation

Recommendation

For most marketplaces, Elasticsearch geo_point with geo_distance query is the right choice. It handles the complexity and performs well at scale.

{
  "query": {
    "bool": {

+ 30 more lines...

Ranking Strategy:

Search results are not just sorted by distance. We combine multiple signals:

def calculate_listing_score(listing, user_location, user_preferences):
    # Distance score (closer = higher)
    distance_miles = haversine(user_location, listing.location)

+ 31 more lines...

Feed vs Search

The home feed (browsing) weights recency higher. Search weights relevance (text match) higher. Both use distance as a filter, not just a ranking signal.

Consistency and Invariants

System Invariants

Sold items must never appear in search results. Users must never message about a sold listing without clear indication it is sold.

The Sold Item Problem:

When an item sells: 1. Database updated immediately (source of truth) 2. Search index updated async (1-5 second delay) 3. CDN cache may still serve old feed (TTL-based)

During this window, users might see and click on sold items.

# When marking item as sold
async def mark_listing_sold(listing_id: str, buyer_id: str):
    async with db.transaction():

+ 28 more lines...

Consistency Model:

| Data | Consistency | Reason | |------|-------------|--------| | Listing status | Strong (DB) | Must be accurate for sold check | | Search index | Eventual (1-5s) | Performance, acceptable lag | | View counts | Eventual (async) | Not critical, batch updates OK | | Messages | Strong | Users expect real-time chat | | User reputation | Eventual | Updated after transaction completes |

What to say

We use eventual consistency for search indexing because a 5-second delay in new listings appearing is acceptable. But sold status is checked against a real-time source (Redis set or DB) to enforce the invariant.

Failure Modes and Resilience

Proactively discuss failures

Let me walk through what happens when components fail and how we handle it.

Failure	Impact	Mitigation	Why It Works
Elasticsearch down	Search unavailable	Fallback to DB query (degraded)	Users can still browse, just slower
Kafka down	Index updates delayed	Buffer in producer, retry	Listings exist in DB, index catches up

+ 4 more rows...

Graceful Degradation Strategy:

async def search_with_fallback(query: str, location: dict, filters: dict):
    try:
        # Primary: Elasticsearch

+ 29 more lines...

Circuit breaker pattern

Wrap Elasticsearch calls in a circuit breaker. After 5 failures in 10 seconds, open the circuit and go directly to fallback for 30 seconds before retrying.

Evolution and Scaling

What to say

This design handles 100M listings and 50M DAU. Let me discuss how it evolves for 10x scale and international expansion.

Evolution Path:

Stage 1: Single Region (up to 100M listings) - Single Elasticsearch cluster - PostgreSQL with read replicas - Works for US-only marketplace

Stage 2: Multi-Region (up to 1B listings) - Geo-sharded Elasticsearch (US-West, US-East, EU, Asia) - Regional PostgreSQL clusters - Most queries stay within region (local marketplace)

Stage 3: Global Scale - CDN for feed caching at edge - Regional ranking models (different preferences by country) - Cross-region search for shipping-enabled items

Multi-Region Architecture

Additional Features for Scale:

Feature	Purpose	Implementation
Personalized Feed	Increase engagement	ML model per user based on browse/buy history
Similar Items	Cross-sell	Embedding-based similarity search

+ 4 more rows...

Alternative approach

If we needed sub-10ms search latency (real-time bidding scenario), I would use a pre-computed tile-based approach where results for each geo-tile and category are cached. Users get instant results, updated every few minutes.

What I would do differently for...

Auction-style marketplace (eBay): Add real-time bidding service, bid history, auction end scheduling, snipe protection.

Services marketplace (TaskRabbit): Add availability calendars, booking system, real-time location for in-progress tasks.

B2B marketplace (Alibaba): Add RFQ system, bulk pricing, supplier verification, trade assurance.

Design Trade-offs

Advantages

+Simple operations
+Strong consistency within cluster
+Easy to debug

Disadvantages

-Single region latency
-Scaling limits
-Single point of failure

When to use

Early stage, single region, under 100M documents

System Design Masterclass

Weather Application with Forecasting

URL Shortener

Live Comments Feature

API Rate Limiter

On-Call Escalation System

Hotel Booking and Reservation System

Parts Compatibility Validation

Real-time Stock Price Viewer

Top-K Rankings System

File Download and Sync Library

Real-time Active Viewers

Marketplace Features

Price Alert System

Netflix Screen Concurrency Limits

Live Reactions System

Top K Most Shared Articles

High-Profile Likes Counter

Authentication and User Login

Google Calendar

Web Crawler

News Feed

Video Streaming Platform

IoC / Dependency Injection Framework

Distributed Control Infrastructure

Notification Service

Distributed Tracing System

P2P File Transfer System

Large Data Migration to Cloud

Wire Transfer API

Large Data Sorting and Processing

Database Control Plane

Distributed Metrics Logging and Aggregation

Ads Management & Delivery System

Flash Sale Backend

Photo Sharing Platform

Cluster Health Monitoring System

Rider Matching System

Surge Pricing System

Collaborative Editing System

Server Metrics Collection System

User Analytics Dashboard & Event Pipeline

Dropbox / Google Drive

Distributed Message Queue

ETA and Live Location Sharing

Distributed Key-Value Store

Distributed Stream Processing System

Payment Processing System

Distributed Job Scheduler

WhatsApp / Messenger

Payment Wallet at Global Scale

Uber / Ride Sharing

Web Search Engine

Globally Distributed SQL Database

Real-Time Analytics System

Recommendation System (Netflix)

Multi-Region Disaster Recovery System

Time-Series Database

Fraud Detection System

Design Marketplace Features

Summary

Key Takeaways

Core Problem

The Hard Part

Scaling Axis

Problem Statement

What to say first

Premium Content