Design Walkthrough
Problem Statement
The Question: Design a backend system that can handle a flash sale with 6 million orders in 1 hour, where limited inventory items are sold at deep discounts.
Flash sales are challenging because: - Traffic spikes: 100x normal traffic in first few seconds - Limited inventory: 10,000 items, 1 million buyers - Time pressure: Sale starts at exact time, everyone refreshes simultaneously - Fairness concerns: First-come-first-served must be honored - Money involved: Overselling means refunds, angry customers, legal issues
What to say first
Before I design, let me clarify the scale and constraints. I want to understand inventory quantities, expected traffic patterns, and what guarantees we need around fairness and inventory accuracy.
Hidden requirements interviewers test: - Can you handle extreme traffic spikes gracefully? - Do you understand inventory consistency challenges? - Can you design for failure during peak load? - Do you know techniques like queuing, caching, and rate limiting?
Clarifying Questions
Ask these questions to demonstrate you understand the unique challenges of flash sales.
Question 1: Traffic Pattern
What is the expected traffic pattern? How many concurrent users at sale start?
Why this matters: Flash sales have thundering herd problem - everyone hits at exactly the same second. Typical answer: 1M+ concurrent users, 100K RPS in first minute, then drops Architecture impact: Need request queuing and rate limiting at entry point
Question 2: Inventory Scale
How many items are for sale? Single product or multiple SKUs?
Why this matters: Single hot item vs distributed inventory changes design. Typical answer: 10,000 units of single product (worst case for hot spot) Architecture impact: Cannot shard by product - need other scaling strategies
Question 3: Overselling Tolerance
Is any overselling acceptable? What happens if we sell 10,001 when we have 10,000?
Why this matters: Determines consistency requirements. Typical answer: Zero tolerance - overselling requires refunds and damages trust Architecture impact: Need strong consistency for inventory, cannot use eventual consistency
Question 4: Fairness Requirements
Must it be strictly first-come-first-served? Is lottery acceptable?
Why this matters: Strict ordering is hard at scale. Typical answer: Best effort FCFS, lottery acceptable if transparent Architecture impact: Can use probabilistic admission if lottery is OK
Stating assumptions
Based on this, I will assume: 6M orders over 1 hour (peak 100K RPS in first minute), 10K inventory of single hot product, zero overselling tolerance, best-effort FCFS with clear communication to users.
The Hard Part
Say this out loud
The hard part here is maintaining accurate inventory counts when thousands of concurrent requests try to decrement the same counter, while handling 100x normal traffic without falling over.
Why this is genuinely hard:
- 1.Thundering Herd: At sale start time, millions of users refresh simultaneously. Normal autoscaling cannot react fast enough.
- 2.Hot Key Problem: Single product means single inventory counter. Cannot shard the hot path.
- 3.Race Conditions: Classic read-modify-write race: - Thread A reads inventory = 1 - Thread B reads inventory = 1 - Thread A decrements, writes inventory = 0 - Thread B decrements, writes inventory = -1 (OVERSOLD!)
- 4.Database Bottleneck: Single row for inventory becomes bottleneck. Traditional databases cannot handle 100K updates/sec to one row.
Common mistake
Candidates often propose simple database transactions without addressing the hot key problem. A single inventory row in PostgreSQL can handle maybe 1000 updates/sec - we need 100x that.
The fundamental insight:
We cannot process 100K inventory decrements per second. Instead, we must: 1. Reduce contention: Only let likely-to-succeed requests through 2. Queue requests: Convert spike into steady stream 3. Pre-allocate inventory: Distribute inventory counters across shards
Scale and Access Patterns
Let me estimate the scale and understand access patterns for flash sale.
| Dimension | Value | Impact |
|---|---|---|
| Total Orders Target | 6 million / hour | Average 1,667 orders/sec sustained |
| Peak RPS | 100,000+ | First 60 seconds see 100x average load |
What to say
The key insight is that 99% of purchase attempts will fail because demand far exceeds supply. We should optimize for fast rejection of the 99%, not just fast success for the 1%.
Traffic Pattern:
- T+0 to T+60s: 100K RPS (thundering herd)
- T+60s to T+5min: 10K RPS (still elevated) Access Pattern Analysis:
- Extreme write contention: Single inventory counter updated by all - Read heavy after sellout: Users keep checking if back in stock - Time-bounded spike: Load drops dramatically after first few minutes - Geographic distribution: Users worldwide, latency varies
High-Level Architecture
The key architectural insight is to use multiple layers of filtering to reduce contention on inventory.
What to say
I will design a funnel architecture where each layer filters out requests, so only a manageable number reach the inventory system. We scale horizontally at the edge and queue requests before the bottleneck.
Flash Sale Architecture
Layer-by-Layer Filtering:
Layer 1 - Edge/CDN (1M to 200K requests): - Static content cached at edge - Bot protection filters automated attacks - Geographic rate limiting - Filters 80% of requests
Layer 2 - API Gateway (200K to 50K requests): - Per-user rate limiting (1 attempt per second) - Virtual queue with position tracking - Early rejection if inventory depleted - Filters 75% of remaining requests
Layer 3 - Purchase Service (50K to 10K requests): - Validates user eligibility - Checks if user already purchased - Lottery/probabilistic admission if needed - Filters to match inventory availability
Layer 4 - Inventory (10K successful reservations): - Atomic inventory decrement in Redis - Persist to database for durability - Strong consistency for inventory count
Layer 5 - Order Processing (async): - Queue successful reservations - Process payment asynchronously - Send confirmation emails
Real-world reference
Alibaba uses a similar funnel architecture for Singles Day (11.11). They process 500K+ orders per second by having multiple filtering layers before inventory operations.
Data Model and Storage
The data model must support both high-speed inventory operations and durable order storage.
What to say
I use Redis as the hot path for inventory operations because it can handle 100K+ atomic operations per second. PostgreSQL is the source of truth, synchronized asynchronously.
-- Flash Sale Event
CREATE TABLE flash_sales (
id UUID PRIMARY KEY,Redis Data Structures:
# Inventory counter (atomic operations)
flash:sale:{sale_id}:inventory = 10000 # Current inventory count
Atomic Inventory Decrement:
-- KEYS[1] = flash:sale:{sale_id}:inventory
-- KEYS[2] = flash:sale:{sale_id}:users
-- ARGV[1] = user_idImportant detail
The Lua script runs atomically in Redis. This prevents race conditions where two users both see inventory=1 and both try to purchase. Only one will succeed.
Algorithm Deep Dive
Several key algorithms make flash sales work at scale.
1. Virtual Queue System
Instead of letting all users hammer the purchase endpoint, assign queue positions.
class VirtualQueue:
def __init__(self, redis_client, sale_id):
self.redis = redis_client2. Inventory Pre-sharding
For extremely hot items, pre-allocate inventory across multiple Redis keys.
class ShardedInventory:
def __init__(self, redis_client, sale_id, total_inventory, num_shards=10):
self.redis = redis_client3. Probabilistic Early Rejection
When inventory is low, use probability to reject early.
class AdmissionController:
def __init__(self, total_inventory: int):
self.total_inventory = total_inventoryWhich approach to use?
Virtual queue provides best user experience (shows position). Sharding provides highest throughput. Probabilistic rejection is simplest to implement. Most production systems combine all three.
Consistency and Invariants
System Invariants
The system must NEVER sell more inventory than available. This is non-negotiable. Underselling (failing when inventory exists) is acceptable but should be minimized.
Why overselling is catastrophic:
- 1.Customer Trust: Telling customers their order is cancelled destroys trust 2. Legal Issues: Taking payment for unavailable goods has legal implications 3. Operational Cost: Refund processing, customer support, compensation 4. Brand Damage: Social media amplifies negative experiences
Consistency Strategy:
We use a two-phase approach:
Phase 1: Reservation (Redis - Strong Consistency) - Atomic decrement in Redis - User gets immediate confirmation of reservation - Reservation has short TTL (5-10 minutes)
Phase 2: Confirmation (PostgreSQL - Durable) - Async write to database - If payment fails, reservation expires - Inventory returned to pool
async def purchase_item(user_id: str, sale_id: str) -> PurchaseResult:
# Phase 1: Reserve in Redis (fast, atomic)
reservation = await redis_reserve(Handling Reservation Expiry:
async def process_expired_reservations():
"""Background job to return expired inventory"""
while True:Business impact mapping
Underselling by 1-2% is acceptable (some users will not complete checkout). Overselling by even 0.1% is unacceptable (1000 angry customers on 1M orders). We bias heavily toward preventing overselling.
Failure Modes and Resilience
Proactively discuss failures
Flash sales are high-stakes. Let me walk through what happens when things go wrong.
| Failure | Impact | Mitigation | Why It Works |
|---|---|---|---|
| Redis down | Cannot process purchases | Redis Sentinel with auto-failover | Standby promotes in seconds |
| Database slow | Reservation writes queue up | Async writes with queue buffer | Decouples hot path from DB |
Pre-sale Preparation:
class FlashSaleReadiness:
async def pre_sale_checks(self, sale_id: str) -> CheckResult:
checks = []Circuit Breaker for Payment:
class PaymentCircuitBreaker:
def __init__(self):
self.failure_count = 0Key insight
During flash sales, extend reservations rather than cancelling when downstream services fail. Users who got a reservation are your most engaged customers - do not lose them due to temporary failures.
Evolution and Scaling
What to say
This design handles 6M orders per hour with single-region deployment. Let me discuss how it evolves for 10x scale and multiple concurrent sales.
Evolution Path:
Stage 1: Single Sale, Single Region (current design) - Handles 6M orders/hour - Single Redis cluster for inventory - Works for most flash sales
Stage 2: Multiple Concurrent Sales - Shard by sale_id - Each sale gets dedicated Redis cluster - Horizontal scaling for different products
Stage 3: Global Flash Sales - Regional inventory allocation - Each region has pre-allocated quota - Global coordinator for rebalancing
Multi-Region Flash Sale Architecture
Inventory Rebalancing Algorithm:
class InventoryCoordinator:
def __init__(self, regions: list):
self.regions = regionsAlternative approach
If we needed even higher scale, I would consider a lottery system where users register interest, and winners are randomly selected. This eliminates the thundering herd entirely but changes the user experience. Companies like NVIDIA use this for GPU launches.
What I would do differently for...
Ticket sales (assigned seating): Add seat selection service with optimistic locking. More complex inventory model.
Limited edition drops (hype products): Add bot protection with proof-of-work or waiting room. Fairness matters more.
Recurring flash sales: Pre-compute user eligibility. Cache everything possible. Learn from previous sale patterns.