System Design Masterclass

58 items

System Design Masterclass

E-Commerceflash-salehigh-throughputinventoryqueuingredisadvanced

Design Flash Sale Backend

Design backend for flash sales handling 6M orders in 1 hour

6M+ orders/hour, 100K+ RPS|Similar to Amazon, Alibaba, Flipkart, Shopify, Walmart|45 min read

Summary

Flash sales create extreme traffic spikes where millions of users compete for limited inventory in seconds. The core challenge is handling 100x normal traffic while maintaining inventory accuracy and preventing overselling. This is asked at Amazon, Alibaba, Flipkart, and any e-commerce company running promotional events.

Key Takeaways

Core Problem

This is fundamentally a distributed inventory reservation problem under extreme concurrency. We must prevent overselling while handling 100x normal traffic.

The Hard Part

Maintaining inventory accuracy when thousands of concurrent requests try to purchase the same limited item. Race conditions can cause overselling or underselling.

Scaling Axis

Scale by product ID for inventory operations. Use request queuing to convert spike traffic into steady stream.

The Question: Design a backend system that can handle a flash sale with 6 million orders in 1 hour, where limited inventory items are sold at deep discounts.

Flash sales are challenging because: - Traffic spikes: 100x normal traffic in first few seconds - Limited inventory: 10,000 items, 1 million buyers - Time pressure: Sale starts at exact time, everyone refreshes simultaneously - Fairness concerns: First-come-first-served must be honored - Money involved: Overselling means refunds, angry customers, legal issues

What to say first

Before I design, let me clarify the scale and constraints. I want to understand inventory quantities, expected traffic patterns, and what guarantees we need around fairness and inventory accuracy.

Hidden requirements interviewers test: - Can you handle extreme traffic spikes gracefully? - Do you understand inventory consistency challenges? - Can you design for failure during peak load? - Do you know techniques like queuing, caching, and rate limiting?

Summary

Key Takeaways

Core Problem

This is fundamentally a distributed inventory reservation problem under extreme concurrency. We must prevent overselling while handling 100x normal traffic.

The Hard Part

Maintaining inventory accuracy when thousands of concurrent requests try to purchase the same limited item. Race conditions can cause overselling or underselling.

Scaling Axis

Scale by product ID for inventory operations. Use request queuing to convert spike traffic into steady stream.

Critical Invariant

Never sell more inventory than available. Overselling damages customer trust and creates fulfillment nightmares. Underselling is acceptable.

Performance Requirement

Handle 100K+ requests per second during sale start. Response time under 500ms for purchase attempts.

Key Tradeoff

We choose slightly lower throughput for strong inventory consistency. Using optimistic locking to prevent overselling even under extreme load.

Design Walkthrough

Problem Statement

The Question: Design a backend system that can handle a flash sale with 6 million orders in 1 hour, where limited inventory items are sold at deep discounts.

What to say first

Before I design, let me clarify the scale and constraints. I want to understand inventory quantities, expected traffic patterns, and what guarantees we need around fairness and inventory accuracy.

Clarifying Questions

Ask these questions to demonstrate you understand the unique challenges of flash sales.

Question 1: Traffic Pattern

What is the expected traffic pattern? How many concurrent users at sale start?

Why this matters: Flash sales have thundering herd problem - everyone hits at exactly the same second. Typical answer: 1M+ concurrent users, 100K RPS in first minute, then drops Architecture impact: Need request queuing and rate limiting at entry point

Question 2: Inventory Scale

How many items are for sale? Single product or multiple SKUs?

Why this matters: Single hot item vs distributed inventory changes design. Typical answer: 10,000 units of single product (worst case for hot spot) Architecture impact: Cannot shard by product - need other scaling strategies

Question 3: Overselling Tolerance

Is any overselling acceptable? What happens if we sell 10,001 when we have 10,000?

Why this matters: Determines consistency requirements. Typical answer: Zero tolerance - overselling requires refunds and damages trust Architecture impact: Need strong consistency for inventory, cannot use eventual consistency

Question 4: Fairness Requirements

Must it be strictly first-come-first-served? Is lottery acceptable?

Why this matters: Strict ordering is hard at scale. Typical answer: Best effort FCFS, lottery acceptable if transparent Architecture impact: Can use probabilistic admission if lottery is OK

Stating assumptions

Based on this, I will assume: 6M orders over 1 hour (peak 100K RPS in first minute), 10K inventory of single hot product, zero overselling tolerance, best-effort FCFS with clear communication to users.

The Hard Part

Say this out loud

The hard part here is maintaining accurate inventory counts when thousands of concurrent requests try to decrement the same counter, while handling 100x normal traffic without falling over.

Why this is genuinely hard:

1.Thundering Herd: At sale start time, millions of users refresh simultaneously. Normal autoscaling cannot react fast enough.
2.Hot Key Problem: Single product means single inventory counter. Cannot shard the hot path.
3.Race Conditions: Classic read-modify-write race: - Thread A reads inventory = 1 - Thread B reads inventory = 1 - Thread A decrements, writes inventory = 0 - Thread B decrements, writes inventory = -1 (OVERSOLD!)
4.Database Bottleneck: Single row for inventory becomes bottleneck. Traditional databases cannot handle 100K updates/sec to one row.

Common mistake

Candidates often propose simple database transactions without addressing the hot key problem. A single inventory row in PostgreSQL can handle maybe 1000 updates/sec - we need 100x that.

The fundamental insight:

We cannot process 100K inventory decrements per second. Instead, we must: 1. Reduce contention: Only let likely-to-succeed requests through 2. Queue requests: Convert spike into steady stream 3. Pre-allocate inventory: Distribute inventory counters across shards

Scale and Access Patterns

Let me estimate the scale and understand access patterns for flash sale.

Dimension	Value	Impact
Total Orders Target	6 million / hour	Average 1,667 orders/sec sustained
Peak RPS	100,000+	First 60 seconds see 100x average load

+ 4 more rows...

What to say

The key insight is that 99% of purchase attempts will fail because demand far exceeds supply. We should optimize for fast rejection of the 99%, not just fast success for the 1%.

Traffic Pattern:
- T+0 to T+60s: 100K RPS (thundering herd)
- T+60s to T+5min: 10K RPS (still elevated)

+ 12 more lines...

Access Pattern Analysis:

Extreme write contention: Single inventory counter updated by all - Read heavy after sellout: Users keep checking if back in stock - Time-bounded spike: Load drops dramatically after first few minutes - Geographic distribution: Users worldwide, latency varies

High-Level Architecture

The key architectural insight is to use multiple layers of filtering to reduce contention on inventory.

What to say

I will design a funnel architecture where each layer filters out requests, so only a manageable number reach the inventory system. We scale horizontally at the edge and queue requests before the bottleneck.

Flash Sale Architecture

Layer-by-Layer Filtering:

Layer 1 - Edge/CDN (1M to 200K requests): - Static content cached at edge - Bot protection filters automated attacks - Geographic rate limiting - Filters 80% of requests

Layer 2 - API Gateway (200K to 50K requests): - Per-user rate limiting (1 attempt per second) - Virtual queue with position tracking - Early rejection if inventory depleted - Filters 75% of remaining requests

Layer 3 - Purchase Service (50K to 10K requests): - Validates user eligibility - Checks if user already purchased - Lottery/probabilistic admission if needed - Filters to match inventory availability

Layer 4 - Inventory (10K successful reservations): - Atomic inventory decrement in Redis - Persist to database for durability - Strong consistency for inventory count

Layer 5 - Order Processing (async): - Queue successful reservations - Process payment asynchronously - Send confirmation emails

Real-world reference

Alibaba uses a similar funnel architecture for Singles Day (11.11). They process 500K+ orders per second by having multiple filtering layers before inventory operations.

Data Model and Storage

The data model must support both high-speed inventory operations and durable order storage.

What to say

I use Redis as the hot path for inventory operations because it can handle 100K+ atomic operations per second. PostgreSQL is the source of truth, synchronized asynchronously.

-- Flash Sale Event
CREATE TABLE flash_sales (
    id UUID PRIMARY KEY,

+ 39 more lines...

Redis Data Structures:

# Inventory counter (atomic operations)
flash:sale:{sale_id}:inventory = 10000  # Current inventory count

+ 17 more lines...

Atomic Inventory Decrement:

-- KEYS[1] = flash:sale:{sale_id}:inventory
-- KEYS[2] = flash:sale:{sale_id}:users
-- ARGV[1] = user_id

+ 33 more lines...

Important detail

The Lua script runs atomically in Redis. This prevents race conditions where two users both see inventory=1 and both try to purchase. Only one will succeed.

Algorithm Deep Dive

Several key algorithms make flash sales work at scale.

1. Virtual Queue System

Instead of letting all users hammer the purchase endpoint, assign queue positions.

class VirtualQueue:
    def __init__(self, redis_client, sale_id):
        self.redis = redis_client

+ 38 more lines...

2. Inventory Pre-sharding

For extremely hot items, pre-allocate inventory across multiple Redis keys.

class ShardedInventory:
    def __init__(self, redis_client, sale_id, total_inventory, num_shards=10):
        self.redis = redis_client

+ 42 more lines...

3. Probabilistic Early Rejection

When inventory is low, use probability to reject early.

class AdmissionController:
    def __init__(self, total_inventory: int):
        self.total_inventory = total_inventory

+ 24 more lines...

Which approach to use?

Virtual queue provides best user experience (shows position). Sharding provides highest throughput. Probabilistic rejection is simplest to implement. Most production systems combine all three.

Consistency and Invariants

System Invariants

The system must NEVER sell more inventory than available. This is non-negotiable. Underselling (failing when inventory exists) is acceptable but should be minimized.

Why overselling is catastrophic:

1.Customer Trust: Telling customers their order is cancelled destroys trust 2. Legal Issues: Taking payment for unavailable goods has legal implications 3. Operational Cost: Refund processing, customer support, compensation 4. Brand Damage: Social media amplifies negative experiences

Consistency Strategy:

We use a two-phase approach:

Phase 1: Reservation (Redis - Strong Consistency) - Atomic decrement in Redis - User gets immediate confirmation of reservation - Reservation has short TTL (5-10 minutes)

Phase 2: Confirmation (PostgreSQL - Durable) - Async write to database - If payment fails, reservation expires - Inventory returned to pool

async def purchase_item(user_id: str, sale_id: str) -> PurchaseResult:
    # Phase 1: Reserve in Redis (fast, atomic)
    reservation = await redis_reserve(

+ 28 more lines...

Handling Reservation Expiry:

async def process_expired_reservations():
    """Background job to return expired inventory"""
    while True:

+ 30 more lines...

Business impact mapping

Underselling by 1-2% is acceptable (some users will not complete checkout). Overselling by even 0.1% is unacceptable (1000 angry customers on 1M orders). We bias heavily toward preventing overselling.

Failure Modes and Resilience

Proactively discuss failures

Flash sales are high-stakes. Let me walk through what happens when things go wrong.

Failure	Impact	Mitigation	Why It Works
Redis down	Cannot process purchases	Redis Sentinel with auto-failover	Standby promotes in seconds
Database slow	Reservation writes queue up	Async writes with queue buffer	Decouples hot path from DB

+ 4 more rows...

Pre-sale Preparation:

class FlashSaleReadiness:
    async def pre_sale_checks(self, sale_id: str) -> CheckResult:
        checks = []

+ 35 more lines...

Circuit Breaker for Payment:

class PaymentCircuitBreaker:
    def __init__(self):
        self.failure_count = 0

+ 29 more lines...

Key insight

During flash sales, extend reservations rather than cancelling when downstream services fail. Users who got a reservation are your most engaged customers - do not lose them due to temporary failures.

Evolution and Scaling

What to say

This design handles 6M orders per hour with single-region deployment. Let me discuss how it evolves for 10x scale and multiple concurrent sales.

Evolution Path:

Stage 1: Single Sale, Single Region (current design) - Handles 6M orders/hour - Single Redis cluster for inventory - Works for most flash sales

Stage 2: Multiple Concurrent Sales - Shard by sale_id - Each sale gets dedicated Redis cluster - Horizontal scaling for different products

Stage 3: Global Flash Sales - Regional inventory allocation - Each region has pre-allocated quota - Global coordinator for rebalancing

Multi-Region Flash Sale Architecture

Inventory Rebalancing Algorithm:

class InventoryCoordinator:
    def __init__(self, regions: list):
        self.regions = regions

+ 38 more lines...

Alternative approach

If we needed even higher scale, I would consider a lottery system where users register interest, and winners are randomly selected. This eliminates the thundering herd entirely but changes the user experience. Companies like NVIDIA use this for GPU launches.

What I would do differently for...

Ticket sales (assigned seating): Add seat selection service with optimistic locking. More complex inventory model.

Limited edition drops (hype products): Add bot protection with proof-of-work or waiting room. Fairness matters more.

Recurring flash sales: Pre-compute user eligibility. Cache everything possible. Learn from previous sale patterns.

Design Trade-offs

Advantages

+Simple implementation
+Strong consistency
+Easy to reason about

Disadvantages

-Limited throughput (100K ops/sec)
-Single point of failure
-Hot key problem

When to use

Moderate scale sales (under 50K RPS)

System Design Masterclass

Weather Application with Forecasting

URL Shortener

Live Comments Feature

API Rate Limiter

On-Call Escalation System

Hotel Booking and Reservation System

Parts Compatibility Validation

Real-time Stock Price Viewer

Top-K Rankings System

File Download and Sync Library

Real-time Active Viewers

Marketplace Features

Price Alert System

Netflix Screen Concurrency Limits

Live Reactions System

Top K Most Shared Articles

High-Profile Likes Counter

Authentication and User Login

Google Calendar

Web Crawler

News Feed

Video Streaming Platform

IoC / Dependency Injection Framework

Distributed Control Infrastructure

Notification Service

Distributed Tracing System

P2P File Transfer System

Large Data Migration to Cloud

Wire Transfer API

Large Data Sorting and Processing

Database Control Plane

Distributed Metrics Logging and Aggregation

Ads Management & Delivery System

Flash Sale Backend

Photo Sharing Platform

Cluster Health Monitoring System

Rider Matching System

Surge Pricing System

Collaborative Editing System

Server Metrics Collection System

User Analytics Dashboard & Event Pipeline

Dropbox / Google Drive

Distributed Message Queue

ETA and Live Location Sharing

Distributed Key-Value Store

Distributed Stream Processing System

Payment Processing System

Distributed Job Scheduler

WhatsApp / Messenger

Payment Wallet at Global Scale

Uber / Ride Sharing

Web Search Engine

Globally Distributed SQL Database

Real-Time Analytics System

Recommendation System (Netflix)

Multi-Region Disaster Recovery System

Time-Series Database

Fraud Detection System

Design Flash Sale Backend

Summary

Key Takeaways

Core Problem

The Hard Part

Scaling Axis

Problem Statement

What to say first

Premium Content