Design Walkthrough
Problem Statement
The Question: Design a surge pricing system that dynamically adjusts ride prices based on real-time supply and demand.
Why surge pricing exists: - Balance supply and demand: When demand exceeds supply, higher prices incentivize more drivers to come online - Reduce wait times: Higher prices reduce demand, improving availability for those willing to pay - Maximize marketplace efficiency: Ensure rides happen where they provide most value - Driver earnings: Allow drivers to earn more during high-demand periods
What to say first
Before I design, let me clarify the requirements. I want to understand how we define supply and demand, the geographic granularity, update frequency, and constraints around price changes.
Hidden requirements interviewers are testing: - Can you model supply and demand mathematically? - Do you understand geospatial indexing and cell design? - Can you balance real-time updates with price stability? - Do you consider fairness and user experience? - Can you handle the scale of millions of location updates per minute?
Clarifying Questions
Ask these questions to shape your architecture. Each answer has significant design implications.
Question 1: Geographic Granularity
How granular should surge zones be? City-level, neighborhood-level, or street-level?
Why this matters: Determines number of cells to compute and storage requirements. Typical answer: Hexagonal cells of roughly 500m-1km diameter (H3 geospatial indexing) Architecture impact: Millions of cells globally, need efficient spatial queries
Question 2: Update Frequency
How often should surge multipliers update? Real-time or batched?
Why this matters: Real-time requires streaming infrastructure, batched is simpler. Typical answer: Compute every 1-2 minutes, but prices can be cached Architecture impact: Need near-real-time pipeline but not true streaming
Question 3: Price Stability
Should prices change smoothly or can they jump? What is the max surge multiplier?
Why this matters: Affects algorithm design and user experience. Typical answer: Smooth transitions preferred, max surge around 3-5x, changes capped at 0.5x per interval Architecture impact: Need smoothing algorithms and rate limiting on price changes
Question 4: Price Lock Duration
Once a user sees a price, how long is it guaranteed?
Why this matters: Affects consistency requirements and potential for arbitrage. Typical answer: Price locked for 2-5 minutes after user accepts Architecture impact: Need price lock storage with TTL, idempotency for bookings
State your assumptions
I will assume: H3 hexagonal cells at resolution 8 (roughly 500m), surge computed every 1-2 minutes, max surge 5x with smooth transitions, price locked for 5 minutes after acceptance.
The Hard Part
Say this out loud
The hard part here is computing accurate supply-demand signals in real-time across millions of geographic cells while maintaining price stability and fairness perception.
Why this is genuinely hard:
- 1.Defining Supply: A driver 5 minutes away counts as supply, but how much? Driver heading toward vs away from area? Driver with passenger vs empty?
- 2.Defining Demand: Open apps vs actual requests? How to predict demand 5 minutes from now?
- 3.Geographic Boundaries: Demand at cell edge should consider adjacent cells. Sharp boundaries create pricing cliffs.
- 4.Feedback Loops: High surge causes drivers to move there, which reduces surge, which causes them to leave - oscillation problem.
- 5.Fairness Perception: User sees 2.5x surge, walks one block, sees 1.5x. Feels unfair even if mathematically correct.
Common mistake
Candidates often propose simple request count / driver count ratio. This ignores driver movement, request cancellations, time-to-pickup, and creates unstable oscillating prices.
The fundamental tradeoffs:
- Responsiveness vs Stability: React quickly to demand spikes vs smooth user experience - Granularity vs Fairness: Fine-grained cells are accurate but create pricing cliffs - Simplicity vs Accuracy: Simple ratio is fast but inaccurate, ML models are accurate but complex - Driver incentives vs User prices: Higher surge brings more drivers but prices out users
Scale and Access Patterns
Let me estimate the scale and understand access patterns.
| Dimension | Value | Impact |
|---|---|---|
| Active cities | 700+ | Separate computation per city possible |
| Geographic cells per city | 10,000-50,000 | ~35M cells globally at H3 resolution 8 |
What to say
At 1.25M location updates per second and 35M cells to compute, this is primarily a real-time aggregation problem. Price lookups are read-heavy and must be cached. Surge computation is write-heavy but less latency sensitive.
Access Pattern Analysis:
- Driver locations: Write-heavy, 1.25M updates/sec, need real-time aggregation by cell - Surge multipliers: Read-heavy (every price quote), write periodic (every 1-2 min) - Price locks: Write on booking acceptance, read on ride completion, TTL expiry - Demand signals: Aggregated from app opens, searches, and actual requests
Driver location storage:
- 5M drivers x 100 bytes = 500 MB (fits in memory)
- Location updates: 1.25M/sec x 100 bytes = 125 MB/sec throughputHigh-Level Architecture
Let me start with the major components and data flows.
What to say
I will separate this into three main flows: location ingestion for supply tracking, demand signal collection, and surge computation. Reads are served from a cache layer.
Surge Pricing Architecture
Component Responsibilities:
- 1.Location Ingester: Receives driver GPS pings (4-sec intervals), maps to H3 cells
- 2.Demand Ingester: Tracks app opens, destination searches, ride requests as demand signals
- 3.Cell Aggregator: Maintains real-time count of drivers and demand signals per cell
- 4.Surge Calculator: Computes raw surge multiplier from supply/demand ratio
- 5.Smoothing Layer: Applies temporal smoothing, caps rate of change, handles neighbor blending
- 6.Price Service: Serves cached surge values, creates price locks on booking
Real-world reference
Uber uses a system called Marketplace that computes surge every 30-60 seconds per H3 cell. They use Apache Flink for stream processing and serve surge from a global Redis cache with local replicas.
Data Model and Storage
Let me define the key data structures and storage choices.
What to say
The source of truth for surge is the computed multiplier per cell. Driver locations are ephemeral state used only for aggregation. Price locks are critical for billing consistency.
-- H3 Cell Surge State (stored in Redis as hash)
-- Key: surge:{city}:{h3_cell_id}
-- Fields:H3 Geospatial Indexing:
We use Uber H3 hexagonal grid system because: - Hexagons have uniform adjacency (6 neighbors, all equidistant) - No edge/corner ambiguity like square grids - Hierarchical (can aggregate to larger regions) - Resolution 8 gives ~460m edge length, good for urban areas
import h3
def get_cell_for_location(lat: float, lng: float) -> str:Storage Choices:
| Data | Storage | Why | |------|---------|-----| | Surge multipliers | Redis Cluster | Sub-ms reads, TTL support | | Driver locations | In-memory (Flink/Spark) | Real-time aggregation | | Price locks | Redis with persistence | Must survive restarts | | Historical surge | Time-series DB (InfluxDB) | Analytics, debugging | | Demand signals | Kafka | Stream processing |
Important detail
Price locks must be persisted (Redis AOF or separate store). If a user books at 1.8x surge and Redis crashes, we must honor that price when the ride completes.
Algorithm Deep Dive
Let me explain the surge calculation algorithm in detail.
Step 1: Supply Calculation
Not all drivers are equal supply. We weight by availability:
def calculate_supply(cell_id: str, drivers: list[Driver]) -> float:
"""
Calculate effective supply for a cell.Step 2: Demand Calculation
Demand is predicted from multiple signals:
def calculate_demand(cell_id: str, signals: DemandSignals) -> float:
"""
Calculate demand from multiple signals.Step 3: Raw Surge Calculation
def calculate_raw_surge(supply: float, demand: float) -> float:
"""
Calculate raw surge multiplier from supply/demand ratio.Step 4: Smoothing and Stability
Raw surge is too volatile. We apply multiple smoothing techniques:
def smooth_surge(
raw_surge: float,
previous_surge: float,Surge Computation Pipeline
Advanced: ML-based surge
Production systems often use ML models that predict optimal surge considering: driver response elasticity, rider price sensitivity, time to pickup, historical patterns, weather, and events. The model outputs surge that maximizes completed rides while maintaining fairness.
Consistency and Invariants
System Invariants
The price shown to user at booking time MUST be honored. Never charge more than displayed. Price locks are sacred - even if surge increases after booking.
Price Lock Flow:
class PriceService:
def get_price_quote(self, pickup: Location, dropoff: Location) -> PriceQuote:
"""Consistency Model:
| Component | Consistency | Rationale | |-----------|-------------|----------| | Surge multipliers | Eventual | Stale by seconds is acceptable | | Price locks | Strong | Must never lose, affects billing | | Driver locations | Eventual | Real-time but not critical | | Demand signals | Eventual | Aggregated over windows |
Race condition to handle
User sees 1.5x surge, clicks book, surge updates to 2.0x before lock created. Solution: Always use the surge from the quote object, not current surge. Quote contains the promised price.
What happens if price lock is lost?
- 1.Ride completes, system looks up price lock 2. Lock not found (Redis failure, TTL expired) 3. Fallback: Use surge at ride start time from audit log 4. If no audit log: Charge base price only (no surge) 5. Alert on-call team - this is a billing incident
Failure Modes and Resilience
Proactively discuss failures
Let me walk through the failure scenarios. Surge pricing has unique failure modes because wrong prices directly affect revenue and user trust.
| Failure | Impact | Mitigation | Why It Works |
|---|---|---|---|
| Surge computation down | Stale surge values served | Cache with TTL, fallback to 1.0x after expiry | Better to charge no surge than random surge |
| Redis cache down | Cannot serve surge | Local cache in price service, circuit breaker | Price service has hot cache of recent surges |
Graceful Degradation Hierarchy:
def get_surge_with_fallbacks(cell_id: str) -> float:
"""
Get surge with multiple fallback levels.Why fail to 1.0x?
When unsure about surge, charge no surge (1.0x). Undercharging is a business loss but overcharging is a trust violation and potential legal issue. Users remember being overcharged.
Surge Sanity Checks:
Before publishing any surge value:
def validate_surge(cell_id: str, new_surge: float, prev_surge: float) -> bool:
"""
Sanity check surge before publishing.Evolution and Scaling
What to say
This design works well for a single region with millions of requests per day. Let me discuss how it evolves for global scale and advanced features.
Evolution Path:
Stage 1: Single City (MVP) - Monolithic surge service - Single Redis for surge cache - Fixed supply/demand formula - Works up to 100K requests/day
Stage 2: Multi-City - Shard by city (each city independent) - Regional Redis clusters - City-specific tuning parameters - Works up to 10M requests/day
Stage 3: Global Scale - Stream processing for aggregation (Flink/Spark) - ML models for surge prediction - Real-time A/B testing of pricing - Works up to 100M+ requests/day
Advanced Features to Mention:
| Feature | Description | Complexity |
|---|---|---|
| Predictive surge | Predict surge 5-10 min ahead to pre-position drivers | ML model on historical patterns |
| Personalized pricing | Different surge for different users based on willingness to pay | Controversial, regulatory concerns |
| Upfront pricing | Fixed price regardless of route taken | Need accurate ETA and route prediction |
| Surge caps | Max surge during emergencies (regulatory) | Override logic and audit trail |
| Driver surge visibility | Show drivers where surge is high | Careful not to cause herding behavior |
Alternative approaches
If I needed lower latency, I would push surge computation to edge nodes (CDN) with periodic sync. If accuracy were more critical than speed, I would use a stronger consistency model with Raft-based storage for price locks.
What I would do differently for...
Food delivery (DoorDash/UberEats): Surge on delivery fees, not food prices. Consider restaurant kitchen capacity as supply constraint, not just drivers.
Grocery delivery (Instacart): Longer time windows allow batching. Surge based on delivery slot availability rather than real-time.
Hotel/Flight pricing: Much longer decision windows. Can use complex optimization. Price changes less frequently.
Concert tickets: One-time event, no supply flexibility. Surge is more about demand capture than balancing marketplace.