Design Walkthrough
Problem Statement
The Question: Design a marketplace like Facebook Marketplace where users can list items for sale and discover items to buy in their local area.
Core Features: - Listing creation: Sellers post items with photos, description, price, location - Discovery feed: Buyers browse items near them, filtered by category - Search: Find specific items by keyword within a geographic radius - Messaging: Buyers and sellers communicate about items - Transaction flow: Mark items as sold, handle disputes
What to say first
Before I design, let me clarify the scale, the primary use case (local vs shipping), and what trust/safety requirements exist. These significantly shape the architecture.
Hidden requirements interviewers test: - Can you design efficient geo-spatial queries? - How do you handle the two-sided marketplace cold start? - What about fraud, scams, and prohibited items? - How do you rank and personalize results? - Can you handle real-time inventory updates?
Clarifying Questions
Ask these questions to demonstrate product thinking alongside technical depth.
Question 1: Scale
How many listings? How many daily active users? What is the read-to-write ratio?
Why this matters: Determines indexing strategy and caching needs. Typical answer: 100M listings, 50M DAU, 100:1 read-to-write Architecture impact: Heavy read optimization, eventual consistency acceptable for listings
Question 2: Geography
Is this local-only (pickup) or does it support shipping? What is the typical search radius?
Why this matters: Local-only means geo-partitioning is viable. Typical answer: Primarily local with 50-mile default radius, optional shipping Architecture impact: Can shard by geography, most queries are geo-bounded
Question 3: Categories
How many categories? Are there category-specific attributes (e.g., mileage for cars)?
Why this matters: Affects schema design and search filtering. Typical answer: 20-30 top-level categories, some with custom attributes Architecture impact: Need flexible schema (JSON attributes) plus category-specific indexes
Question 4: Trust and Safety
What moderation is needed? Are there prohibited items? How do we handle scams?
Why this matters: T&S is a major engineering investment. Typical answer: Need image moderation, text filtering, user reputation, fraud detection Architecture impact: Async processing pipeline, ML models, human review queue
Stating assumptions
I will assume: 100M listings, 50M DAU, primarily local marketplace with 50-mile radius, 20+ categories, need basic trust and safety. Users discover via feed and search.
The Hard Part
Say this out loud
The hard part here is combining geo-proximity, relevance ranking, and real-time inventory freshness in sub-200ms queries across 100 million listings.
Why this is genuinely hard:
- 1.Geo-spatial queries are expensive: Finding all items within 50 miles of a point requires spatial indexing. Naive approach scans entire database.
- 2.Multi-dimensional ranking: Results must balance distance, relevance, recency, seller reputation, and user preferences. This is not a simple sort.
- 3.Inventory freshness: When an item sells, it must disappear from all search results immediately. But we have 100M items in multiple indexes.
- 4.Cold start problem: New users have no history. New listings have no engagement. How do you rank them?
- 5.Trust at scale: With millions of listings, manual review is impossible. Automated systems miss edge cases.
Common mistake
Candidates often design search without considering geo. A marketplace search is fundamentally different from e-commerce search because location is a first-class filter.
The fundamental tradeoffs:
- Freshness vs Performance: Real-time index updates are expensive. How stale can results be? - Relevance vs Fairness: Should new listings get boosted? How do small sellers compete? - Safety vs Friction: More verification means fewer listings. Where is the balance?
Scale and Access Patterns
Let me estimate the scale and understand how users interact with the system.
| Dimension | Value | Impact |
|---|---|---|
| Total Listings | 100 million | Need distributed storage and indexing |
| Daily Active Users | 50 million | Heavy read load, caching critical |
What to say
At 500M searches per day, that is about 6,000 QPS average, maybe 20,000 QPS peak. Each search hits geo-index plus relevance ranking. This is achievable with Elasticsearch cluster.
Access Pattern Analysis:
- Feed browsing: 70% of traffic. Users scroll through nearby items. Highly cacheable by geo-region. - Search: 20% of traffic. Keyword + location + filters. Less cacheable due to query variety. - Listing view: 8% of traffic. Single item detail page. Cacheable by listing ID. - Listing creation: 2% of traffic. Write-heavy, triggers indexing pipeline.
Storage:
- 100M listings x 5KB = 500GB metadata (fits in memory)
- 100M listings x 5 photos x 500KB = 250TB photos (object storage + CDN)High-Level Architecture
Let me walk through the architecture, separating read and write paths.
What to say
I will separate the system into: listing management, search and discovery, messaging, and trust and safety. Each can scale independently.
Marketplace Architecture Overview
Component Responsibilities:
1. Listing Service - CRUD operations for listings - Photo upload to S3 - Publishes events to Kafka for async processing
2. Search Service - Geo-bounded keyword search - Category filtering - Calls Ranking Service for result ordering
3. Indexer (Async) - Consumes listing events from Kafka - Updates Elasticsearch and Geo indexes - Handles deletions when items sell
4. Trust and Safety (Async) - Image moderation (ML models) - Text classification for prohibited content - Fraud signal detection
5. Messaging Service - Real-time chat between buyer/seller - Conversation threads per listing - Push notifications
Real-world reference
Facebook Marketplace uses a similar architecture with TAO for social graph, Elasticsearch for search, and a dedicated Trust and Safety ML platform that processes every listing.
Data Model and Storage
Let me define the core data models and storage choices.
What to say
PostgreSQL is the source of truth for listings. Elasticsearch provides search. Redis caches hot data. We denormalize into search indexes for query performance.
-- Users table
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),Elasticsearch Document Structure:
We denormalize listing data into Elasticsearch for fast geo-search.
{
"id": "listing-uuid",
"title": "iPhone 14 Pro Max 256GB",{
"mappings": {
"properties": {Important detail
The Elasticsearch index is eventually consistent with PostgreSQL. When a listing sells, we must update both the database AND invalidate/update the search index. Use Kafka to ensure this happens reliably.
Geo-Search Deep Dive
Geo-search is the core of marketplace discovery. Let me explain the approaches.
| Approach | How It Works | Pros | Cons |
|---|---|---|---|
| Bounding Box | Filter by lat/lng ranges | Simple, fast | Not circular, misses corners |
| Haversine Distance | Calculate actual distance for each point | Accurate | O(n) - does not scale |
| Geohash | Encode location as string prefix | Efficient prefix queries | Edge cases at boundaries |
| Geo-point (ES) | Native geo_distance query | Optimized, accurate | Requires Elasticsearch |
| H3/S2 Cells | Hierarchical spatial index | Very efficient | More complex implementation |
Recommendation
For most marketplaces, Elasticsearch geo_point with geo_distance query is the right choice. It handles the complexity and performs well at scale.
{
"query": {
"bool": {Ranking Strategy:
Search results are not just sorted by distance. We combine multiple signals:
def calculate_listing_score(listing, user_location, user_preferences):
# Distance score (closer = higher)
distance_miles = haversine(user_location, listing.location)Feed vs Search
The home feed (browsing) weights recency higher. Search weights relevance (text match) higher. Both use distance as a filter, not just a ranking signal.
Consistency and Invariants
System Invariants
Sold items must never appear in search results. Users must never message about a sold listing without clear indication it is sold.
The Sold Item Problem:
When an item sells: 1. Database updated immediately (source of truth) 2. Search index updated async (1-5 second delay) 3. CDN cache may still serve old feed (TTL-based)
During this window, users might see and click on sold items.
# When marking item as sold
async def mark_listing_sold(listing_id: str, buyer_id: str):
async with db.transaction():Consistency Model:
| Data | Consistency | Reason | |------|-------------|--------| | Listing status | Strong (DB) | Must be accurate for sold check | | Search index | Eventual (1-5s) | Performance, acceptable lag | | View counts | Eventual (async) | Not critical, batch updates OK | | Messages | Strong | Users expect real-time chat | | User reputation | Eventual | Updated after transaction completes |
What to say
We use eventual consistency for search indexing because a 5-second delay in new listings appearing is acceptable. But sold status is checked against a real-time source (Redis set or DB) to enforce the invariant.
Failure Modes and Resilience
Proactively discuss failures
Let me walk through what happens when components fail and how we handle it.
| Failure | Impact | Mitigation | Why It Works |
|---|---|---|---|
| Elasticsearch down | Search unavailable | Fallback to DB query (degraded) | Users can still browse, just slower |
| Kafka down | Index updates delayed | Buffer in producer, retry | Listings exist in DB, index catches up |
Graceful Degradation Strategy:
async def search_with_fallback(query: str, location: dict, filters: dict):
try:
# Primary: ElasticsearchCircuit breaker pattern
Wrap Elasticsearch calls in a circuit breaker. After 5 failures in 10 seconds, open the circuit and go directly to fallback for 30 seconds before retrying.
Evolution and Scaling
What to say
This design handles 100M listings and 50M DAU. Let me discuss how it evolves for 10x scale and international expansion.
Evolution Path:
Stage 1: Single Region (up to 100M listings) - Single Elasticsearch cluster - PostgreSQL with read replicas - Works for US-only marketplace
Stage 2: Multi-Region (up to 1B listings) - Geo-sharded Elasticsearch (US-West, US-East, EU, Asia) - Regional PostgreSQL clusters - Most queries stay within region (local marketplace)
Stage 3: Global Scale - CDN for feed caching at edge - Regional ranking models (different preferences by country) - Cross-region search for shipping-enabled items
Multi-Region Architecture
Additional Features for Scale:
| Feature | Purpose | Implementation |
|---|---|---|
| Personalized Feed | Increase engagement | ML model per user based on browse/buy history |
| Similar Items | Cross-sell | Embedding-based similarity search |
Alternative approach
If we needed sub-10ms search latency (real-time bidding scenario), I would use a pre-computed tile-based approach where results for each geo-tile and category are cached. Users get instant results, updated every few minutes.
What I would do differently for...
Auction-style marketplace (eBay): Add real-time bidding service, bid history, auction end scheduling, snipe protection.
Services marketplace (TaskRabbit): Add availability calendars, booking system, real-time location for in-progress tasks.
B2B marketplace (Alibaba): Add RFQ system, bulk pricing, supplier verification, trade assurance.