System Design Masterclass

58 items

System Design Masterclass

Ride Sharingpricingreal-timeride-sharingalgorithmsgeospatialadvanced

Design Surge Pricing System

Design dynamic pricing for ride-sharing apps like Uber/Lyft

Millions of price calculations/min|Similar to Uber, Lyft, DoorDash, Instacart, Grab|45 min read

Summary

Surge pricing dynamically adjusts ride prices based on real-time supply and demand to balance the marketplace. The core challenge is computing accurate supply-demand signals across millions of geographic cells, updating prices in real-time, and ensuring the pricing feels fair to users while effectively incentivizing drivers. This is asked at Uber, Lyft, DoorDash, and any company with dynamic marketplace pricing.

Key Takeaways

Core Problem

This is fundamentally a real-time market equilibrium problem. We are computing a price that balances supply (drivers) and demand (riders) dynamically across geographic regions.

The Hard Part

Computing accurate supply-demand signals in real-time across millions of geographic cells while keeping prices stable enough that users do not see wild fluctuations.

Scaling Axis

Scale by geographic region. Each city or zone can compute surge independently. The problem is embarrassingly parallel across geography.

The Question: Design a surge pricing system that dynamically adjusts ride prices based on real-time supply and demand.

Why surge pricing exists: - Balance supply and demand: When demand exceeds supply, higher prices incentivize more drivers to come online - Reduce wait times: Higher prices reduce demand, improving availability for those willing to pay - Maximize marketplace efficiency: Ensure rides happen where they provide most value - Driver earnings: Allow drivers to earn more during high-demand periods

What to say first

Before I design, let me clarify the requirements. I want to understand how we define supply and demand, the geographic granularity, update frequency, and constraints around price changes.

Hidden requirements interviewers are testing: - Can you model supply and demand mathematically? - Do you understand geospatial indexing and cell design? - Can you balance real-time updates with price stability? - Do you consider fairness and user experience? - Can you handle the scale of millions of location updates per minute?

Summary

Key Takeaways

Core Problem

This is fundamentally a real-time market equilibrium problem. We are computing a price that balances supply (drivers) and demand (riders) dynamically across geographic regions.

The Hard Part

Computing accurate supply-demand signals in real-time across millions of geographic cells while keeping prices stable enough that users do not see wild fluctuations.

Scaling Axis

Scale by geographic region. Each city or zone can compute surge independently. The problem is embarrassingly parallel across geography.

Critical Invariant

The price shown to user at request time must be honored. Never charge more than displayed. Price locks must have bounded TTL.

Performance Requirement

Price lookup must be sub-100ms. Surge computation can be slightly delayed (1-5 seconds) but price reads must be instant.

Key Tradeoff

Responsiveness vs stability. Highly responsive pricing reacts to every fluctuation causing user confusion. Stable pricing may miss sudden demand spikes.

Design Walkthrough

Problem Statement

The Question: Design a surge pricing system that dynamically adjusts ride prices based on real-time supply and demand.

What to say first

Before I design, let me clarify the requirements. I want to understand how we define supply and demand, the geographic granularity, update frequency, and constraints around price changes.

Clarifying Questions

Ask these questions to shape your architecture. Each answer has significant design implications.

Question 1: Geographic Granularity

How granular should surge zones be? City-level, neighborhood-level, or street-level?

Why this matters: Determines number of cells to compute and storage requirements. Typical answer: Hexagonal cells of roughly 500m-1km diameter (H3 geospatial indexing) Architecture impact: Millions of cells globally, need efficient spatial queries

Question 2: Update Frequency

How often should surge multipliers update? Real-time or batched?

Why this matters: Real-time requires streaming infrastructure, batched is simpler. Typical answer: Compute every 1-2 minutes, but prices can be cached Architecture impact: Need near-real-time pipeline but not true streaming

Question 3: Price Stability

Should prices change smoothly or can they jump? What is the max surge multiplier?

Why this matters: Affects algorithm design and user experience. Typical answer: Smooth transitions preferred, max surge around 3-5x, changes capped at 0.5x per interval Architecture impact: Need smoothing algorithms and rate limiting on price changes

Question 4: Price Lock Duration

Once a user sees a price, how long is it guaranteed?

Why this matters: Affects consistency requirements and potential for arbitrage. Typical answer: Price locked for 2-5 minutes after user accepts Architecture impact: Need price lock storage with TTL, idempotency for bookings

State your assumptions

I will assume: H3 hexagonal cells at resolution 8 (roughly 500m), surge computed every 1-2 minutes, max surge 5x with smooth transitions, price locked for 5 minutes after acceptance.

The Hard Part

Say this out loud

The hard part here is computing accurate supply-demand signals in real-time across millions of geographic cells while maintaining price stability and fairness perception.

Why this is genuinely hard:

1.Defining Supply: A driver 5 minutes away counts as supply, but how much? Driver heading toward vs away from area? Driver with passenger vs empty?
2.Defining Demand: Open apps vs actual requests? How to predict demand 5 minutes from now?
3.Geographic Boundaries: Demand at cell edge should consider adjacent cells. Sharp boundaries create pricing cliffs.
4.Feedback Loops: High surge causes drivers to move there, which reduces surge, which causes them to leave - oscillation problem.
5.Fairness Perception: User sees 2.5x surge, walks one block, sees 1.5x. Feels unfair even if mathematically correct.

Common mistake

Candidates often propose simple request count / driver count ratio. This ignores driver movement, request cancellations, time-to-pickup, and creates unstable oscillating prices.

The fundamental tradeoffs:

Responsiveness vs Stability: React quickly to demand spikes vs smooth user experience - Granularity vs Fairness: Fine-grained cells are accurate but create pricing cliffs - Simplicity vs Accuracy: Simple ratio is fast but inaccurate, ML models are accurate but complex - Driver incentives vs User prices: Higher surge brings more drivers but prices out users

Scale and Access Patterns

Let me estimate the scale and understand access patterns.

Dimension	Value	Impact
Active cities	700+	Separate computation per city possible
Geographic cells per city	10,000-50,000	~35M cells globally at H3 resolution 8

+ 4 more rows...

What to say

At 1.25M location updates per second and 35M cells to compute, this is primarily a real-time aggregation problem. Price lookups are read-heavy and must be cached. Surge computation is write-heavy but less latency sensitive.

Access Pattern Analysis:

Driver locations: Write-heavy, 1.25M updates/sec, need real-time aggregation by cell - Surge multipliers: Read-heavy (every price quote), write periodic (every 1-2 min) - Price locks: Write on booking acceptance, read on ride completion, TTL expiry - Demand signals: Aggregated from app opens, searches, and actual requests

Driver location storage:
- 5M drivers x 100 bytes = 500 MB (fits in memory)
- Location updates: 1.25M/sec x 100 bytes = 125 MB/sec throughput

+ 12 more lines...

High-Level Architecture

Let me start with the major components and data flows.

What to say

I will separate this into three main flows: location ingestion for supply tracking, demand signal collection, and surge computation. Reads are served from a cache layer.

Surge Pricing Architecture

Component Responsibilities:

1.Location Ingester: Receives driver GPS pings (4-sec intervals), maps to H3 cells
2.Demand Ingester: Tracks app opens, destination searches, ride requests as demand signals
3.Cell Aggregator: Maintains real-time count of drivers and demand signals per cell
4.Surge Calculator: Computes raw surge multiplier from supply/demand ratio
5.Smoothing Layer: Applies temporal smoothing, caps rate of change, handles neighbor blending
6.Price Service: Serves cached surge values, creates price locks on booking

Real-world reference

Uber uses a system called Marketplace that computes surge every 30-60 seconds per H3 cell. They use Apache Flink for stream processing and serve surge from a global Redis cache with local replicas.

Data Model and Storage

Let me define the key data structures and storage choices.

What to say

The source of truth for surge is the computed multiplier per cell. Driver locations are ephemeral state used only for aggregation. Price locks are critical for billing consistency.

-- H3 Cell Surge State (stored in Redis as hash)
-- Key: surge:{city}:{h3_cell_id}
-- Fields:

+ 30 more lines...

H3 Geospatial Indexing:

We use Uber H3 hexagonal grid system because: - Hexagons have uniform adjacency (6 neighbors, all equidistant) - No edge/corner ambiguity like square grids - Hierarchical (can aggregate to larger regions) - Resolution 8 gives ~460m edge length, good for urban areas

import h3

def get_cell_for_location(lat: float, lng: float) -> str:

+ 15 more lines...

Storage Choices:

| Data | Storage | Why | |------|---------|-----| | Surge multipliers | Redis Cluster | Sub-ms reads, TTL support | | Driver locations | In-memory (Flink/Spark) | Real-time aggregation | | Price locks | Redis with persistence | Must survive restarts | | Historical surge | Time-series DB (InfluxDB) | Analytics, debugging | | Demand signals | Kafka | Stream processing |

Important detail

Price locks must be persisted (Redis AOF or separate store). If a user books at 1.8x surge and Redis crashes, we must honor that price when the ride completes.

Algorithm Deep Dive

Let me explain the surge calculation algorithm in detail.

Step 1: Supply Calculation

Not all drivers are equal supply. We weight by availability:

def calculate_supply(cell_id: str, drivers: list[Driver]) -> float:
    """
    Calculate effective supply for a cell.

+ 20 more lines...

Step 2: Demand Calculation

Demand is predicted from multiple signals:

def calculate_demand(cell_id: str, signals: DemandSignals) -> float:
    """
    Calculate demand from multiple signals.

+ 20 more lines...

Step 3: Raw Surge Calculation

def calculate_raw_surge(supply: float, demand: float) -> float:
    """
    Calculate raw surge multiplier from supply/demand ratio.

+ 21 more lines...

Step 4: Smoothing and Stability

Raw surge is too volatile. We apply multiple smoothing techniques:

def smooth_surge(
    raw_surge: float,
    previous_surge: float,

+ 24 more lines...

Surge Computation Pipeline

Advanced: ML-based surge

Production systems often use ML models that predict optimal surge considering: driver response elasticity, rider price sensitivity, time to pickup, historical patterns, weather, and events. The model outputs surge that maximizes completed rides while maintaining fairness.

Consistency and Invariants

System Invariants

The price shown to user at booking time MUST be honored. Never charge more than displayed. Price locks are sacred - even if surge increases after booking.

Price Lock Flow:

class PriceService:
    def get_price_quote(self, pickup: Location, dropoff: Location) -> PriceQuote:
        """

+ 48 more lines...

Consistency Model:

| Component | Consistency | Rationale | |-----------|-------------|----------| | Surge multipliers | Eventual | Stale by seconds is acceptable | | Price locks | Strong | Must never lose, affects billing | | Driver locations | Eventual | Real-time but not critical | | Demand signals | Eventual | Aggregated over windows |

Race condition to handle

User sees 1.5x surge, clicks book, surge updates to 2.0x before lock created. Solution: Always use the surge from the quote object, not current surge. Quote contains the promised price.

What happens if price lock is lost?

1.Ride completes, system looks up price lock 2. Lock not found (Redis failure, TTL expired) 3. Fallback: Use surge at ride start time from audit log 4. If no audit log: Charge base price only (no surge) 5. Alert on-call team - this is a billing incident

Failure Modes and Resilience

Proactively discuss failures

Let me walk through the failure scenarios. Surge pricing has unique failure modes because wrong prices directly affect revenue and user trust.

Failure	Impact	Mitigation	Why It Works
Surge computation down	Stale surge values served	Cache with TTL, fallback to 1.0x after expiry	Better to charge no surge than random surge
Redis cache down	Cannot serve surge	Local cache in price service, circuit breaker	Price service has hot cache of recent surges

+ 4 more rows...

Graceful Degradation Hierarchy:

def get_surge_with_fallbacks(cell_id: str) -> float:
    """
    Get surge with multiple fallback levels.

+ 26 more lines...

Why fail to 1.0x?

When unsure about surge, charge no surge (1.0x). Undercharging is a business loss but overcharging is a trust violation and potential legal issue. Users remember being overcharged.

Surge Sanity Checks:

Before publishing any surge value:

def validate_surge(cell_id: str, new_surge: float, prev_surge: float) -> bool:
    """
    Sanity check surge before publishing.

+ 19 more lines...

Evolution and Scaling

What to say

This design works well for a single region with millions of requests per day. Let me discuss how it evolves for global scale and advanced features.

Evolution Path:

Stage 1: Single City (MVP) - Monolithic surge service - Single Redis for surge cache - Fixed supply/demand formula - Works up to 100K requests/day

Stage 2: Multi-City - Shard by city (each city independent) - Regional Redis clusters - City-specific tuning parameters - Works up to 10M requests/day

Stage 3: Global Scale - Stream processing for aggregation (Flink/Spark) - ML models for surge prediction - Real-time A/B testing of pricing - Works up to 100M+ requests/day

Advanced Features to Mention:

Feature	Description	Complexity
Predictive surge	Predict surge 5-10 min ahead to pre-position drivers	ML model on historical patterns
Personalized pricing	Different surge for different users based on willingness to pay	Controversial, regulatory concerns
Upfront pricing	Fixed price regardless of route taken	Need accurate ETA and route prediction
Surge caps	Max surge during emergencies (regulatory)	Override logic and audit trail
Driver surge visibility	Show drivers where surge is high	Careful not to cause herding behavior

Alternative approaches

If I needed lower latency, I would push surge computation to edge nodes (CDN) with periodic sync. If accuracy were more critical than speed, I would use a stronger consistency model with Raft-based storage for price locks.

What I would do differently for...

Food delivery (DoorDash/UberEats): Surge on delivery fees, not food prices. Consider restaurant kitchen capacity as supply constraint, not just drivers.

Grocery delivery (Instacart): Longer time windows allow batching. Surge based on delivery slot availability rather than real-time.

Hotel/Flight pricing: Much longer decision windows. Can use complex optimization. Price changes less frequently.

Concert tickets: One-time event, no supply flexibility. Surge is more about demand capture than balancing marketplace.

Global Surge Architecture

Design Trade-offs

Advantages

+Easy to implement
+Easy to understand
+Fast computation

Disadvantages

-Volatile prices
-Does not account for driver movement
-Binary supply classification

When to use

MVP or low-scale deployments

System Design Masterclass

Weather Application with Forecasting

URL Shortener

Live Comments Feature

API Rate Limiter

On-Call Escalation System

Hotel Booking and Reservation System

Parts Compatibility Validation

Real-time Stock Price Viewer

Top-K Rankings System

File Download and Sync Library

Real-time Active Viewers

Marketplace Features

Price Alert System

Netflix Screen Concurrency Limits

Live Reactions System

Top K Most Shared Articles

High-Profile Likes Counter

Authentication and User Login

Google Calendar

Web Crawler

News Feed

Video Streaming Platform

IoC / Dependency Injection Framework

Distributed Control Infrastructure

Notification Service

Distributed Tracing System

P2P File Transfer System

Large Data Migration to Cloud

Wire Transfer API

Large Data Sorting and Processing

Database Control Plane

Distributed Metrics Logging and Aggregation

Ads Management & Delivery System

Flash Sale Backend

Photo Sharing Platform

Cluster Health Monitoring System

Rider Matching System

Surge Pricing System

Collaborative Editing System

Server Metrics Collection System

User Analytics Dashboard & Event Pipeline

Dropbox / Google Drive

Distributed Message Queue

ETA and Live Location Sharing

Distributed Key-Value Store

Distributed Stream Processing System

Payment Processing System

Distributed Job Scheduler

WhatsApp / Messenger

Payment Wallet at Global Scale

Uber / Ride Sharing

Web Search Engine

Globally Distributed SQL Database

Real-Time Analytics System

Recommendation System (Netflix)

Multi-Region Disaster Recovery System

Time-Series Database

Fraud Detection System

Design Surge Pricing System

Summary

Key Takeaways

Core Problem

The Hard Part

Scaling Axis

Problem Statement

What to say first

Premium Content