Design Walkthrough
Problem Statement
The Question: Design a system that matches riders requesting trips with nearby available drivers, handling millions of matches per day.
The matching system is the core of any ride-sharing platform: - Real-time: Riders expect a driver within seconds - Location-aware: Match with nearby drivers to minimize pickup time - Optimized: Balance rider wait time, driver utilization, and fairness - Reliable: Handle driver location updates, network issues, and edge cases
What to say first
Before designing, let me clarify the scope. Are we focusing on ride-sharing (Uber) or delivery (Uber Eats)? The matching logic differs - delivery has pickup and dropoff locations for the order, not the customer.
Hidden requirements interviewers test: - Can you handle real-time geo-spatial queries at scale? - Do you understand the matching optimization tradeoffs? - How do you handle the cold start and supply-demand imbalance? - Can you design for failure modes (driver app crashes, GPS issues)?
Clarifying Questions
Ask these questions to scope the problem correctly.
Question 1: Scale
How many concurrent drivers and ride requests? What is the geographic scope?
Why this matters: Determines if single-city or multi-region architecture. Typical answer: 100K concurrent drivers per major city, 10K requests per minute at peak. Architecture impact: Need geo-sharding, each city can be semi-independent.
Question 2: Matching Goals
What are we optimizing for? Fastest match, shortest pickup, driver fairness, or overall efficiency?
Why this matters: Different objectives lead to different algorithms. Typical answer: Minimize rider wait time as primary, with fairness constraints for drivers. Architecture impact: Greedy matching with fairness adjustments vs global optimization.
Question 3: Driver Types
Single driver type or multiple (UberX, UberXL, Black)? Can one driver serve multiple types?
Why this matters: Affects matching pool and constraints. Typical answer: Multiple types, drivers can be eligible for several. Architecture impact: Need driver capability filtering before matching.
Question 4: Ride-sharing vs Delivery
Is this for ride-sharing (passenger in car) or delivery (food/packages)?
Why this matters: Delivery allows batching multiple orders; ride-sharing typically does not. Typical answer: Focus on ride-sharing, mention delivery as extension. Architecture impact: Delivery adds order batching optimization layer.
Stating assumptions
I will assume: ride-sharing focus, 100K concurrent drivers per city, optimize for rider wait time with driver fairness, multiple vehicle types, and real-time matching within 5 seconds.
The Hard Part
Say this out loud
The hard part is balancing match speed with match quality while handling rapidly changing driver locations. We need sub-second location updates and sub-5-second matching.
Why this is genuinely hard:
- 1.Real-time Location Tracking: 100K drivers sending GPS updates every 3-5 seconds = 20-30K updates/second per city. Must index for fast geo-queries.
- 2.Matching Optimization: Finding the globally optimal driver-rider assignment is NP-hard. We need approximation algorithms that are fast enough for real-time.
- 3.Race Conditions: Two riders request at the same time, both see the same nearby driver. How do we prevent double-booking?
- 4.Stale Data: Driver location from 5 seconds ago may be outdated. Driver may have moved or gone offline.
- 5.Supply-Demand Imbalance: During rush hour, more riders than drivers. How do we queue fairly? During slow times, how do we distribute rides fairly among drivers?
Common mistake
Candidates often propose simple nearest-driver matching without considering: What if that driver is about to complete a trip? What about driver fairness (some drivers get all rides)? What about matching quality vs speed tradeoff?
The fundamental tradeoffs:
- Speed vs Quality: Instant greedy match vs waiting to batch and optimize - Rider vs Driver: Minimize rider wait vs fair distribution to drivers - Local vs Global: Best match for this rider vs best overall system utilization - Accuracy vs Freshness: Precise ETA calculations vs using cached/stale data
Scale and Access Patterns
Let me estimate the scale for a major city like San Francisco or New York.
| Dimension | Value | Impact |
|---|---|---|
| Concurrent drivers (city) | 100,000 | Need efficient geo-index |
| Location updates/second | 25,000 | High write throughput for location store |
What to say
At 25K location updates per second and 167 matches per second per city, we need a geo-spatial index optimized for both high write throughput and fast proximity queries.
Access Pattern Analysis:
- Write-heavy for locations: 25K writes/sec vs hundreds of reads for matching - Read pattern: Find all drivers within X km of a point, sorted by distance/ETA - Highly localized: 95% of queries are within a single city - Time-sensitive: Data older than 30 seconds is nearly useless - Bursty: Rush hour has 5-10x normal traffic
Location storage per driver:
- driver_id: 8 bytes
- latitude: 8 bytes High-Level Architecture
Let me design the architecture starting with core components.
What to say
I will separate the system into three main services: Location Service (tracking), Matching Service (dispatch), and Trip Service (state management). They communicate via async events with synchronous calls only for critical paths.
Rider Matching Architecture
Component Responsibilities:
1. Location Service - Receives GPS updates from driver apps via WebSocket - Updates geo-spatial index in real-time - Publishes location events to Kafka for analytics - Maintains driver online/offline status
2. Matching Service - Receives ride requests from riders - Queries geo-index for nearby available drivers - Runs matching algorithm to select best driver - Sends dispatch request to selected driver - Handles driver accept/reject responses
3. Trip Service - Manages trip lifecycle (requested, matched, pickup, in_progress, completed) - Handles cancellations and reassignment - Stores trip history for billing and analytics
4. Geo Index - Stores current driver locations with status - Supports fast proximity queries (drivers within X km) - High write throughput for location updates - Options: Redis with geospatial, custom quad-tree, H3/S2 cells
Real-world reference
Uber uses a service called Ringpop for consistent hashing and DISCO for dispatch. Lyft uses a custom geo-index based on S2 cells. Both separate location tracking from matching logic.
Data Model and Storage
The data model needs to support real-time location queries and trip state management.
What to say
I will use Redis with geospatial commands for the hot path (driver locations), and PostgreSQL for durable trip state. The geo-index is the source of truth for driver availability.
-- Driver location (in Redis, shown as logical structure)
-- Key: driver:{driver_id}:location
-- GEOADD drivers:{city_id} longitude latitude driver_idGeo-Spatial Indexing Options:
| Approach | Pros | Cons | Best For |
|---|---|---|---|
| Redis GEOADD | Simple, fast, built-in | Single node limit, basic queries | Moderate scale, simple needs |
| PostGIS | Rich queries, ACID | Slower updates, not real-time | Analytics, historical queries |
| S2/H3 Cells | Hierarchical, shardable | More complex, custom code | Large scale, multi-region |
| Custom Quad-tree | Optimized for use case | Build and maintain yourself | Extreme performance needs |
import redis
class DriverLocationStore:Critical detail
The set_driver_unavailable operation MUST be atomic. If two matching requests try to assign the same driver, only one should succeed. Use Redis Lua scripts or distributed locks.
Matching Algorithm Deep Dive
The matching algorithm is the core intellectual property of ride-sharing companies. Let me walk through the approaches.
| Algorithm | Speed | Quality | Complexity | Use Case |
|---|---|---|---|---|
| Nearest Available | Instant | Poor | Simple | Low traffic, simple MVP |
| Greedy with Scoring | Fast (ms) | Good | Medium | Most production systems |
| Batch Matching | Slower (seconds) | Optimal | Complex | High-density areas |
| Auction-based | Medium | Fair | Complex | Driver fairness focus |
Approach 1: Greedy with Scoring (Recommended)
For each ride request, score all nearby drivers and pick the best.
from dataclasses import dataclass
from typing import List, Optional
import heapqApproach 2: Batch Matching (High-Density Optimization)
Instead of matching instantly, collect requests for a short window (1-2 seconds) and find the globally optimal assignment.
Batch Matching Flow
from scipy.optimize import linear_sum_assignment
import numpy as np
When to use batch matching
Batch matching improves overall efficiency by 10-15% but adds 1-2 seconds latency. Use it in high-density areas (airports, downtown) where many requests and drivers exist simultaneously.
Consistency and Invariants
System Invariants
1. A driver must NEVER be assigned to two rides simultaneously. 2. A rider must not be charged if no driver is assigned. 3. Driver location data must never be more than 30 seconds stale for matching.
Preventing Double-Booking
This is the critical invariant. Multiple matching requests might try to assign the same driver.
# Option 1: Redis Lua Script (recommended for speed)
ASSIGN_DRIVER_SCRIPT = """
local driver_key = KEYS[1]Handling Driver Response Timeout
After dispatch, driver has limited time to accept/reject. If no response, reassign.
Driver Response State Machine
Critical timeout handling
If driver does not respond within 15 seconds, automatically release the driver and re-enter the ride into matching. Track timeout rate per driver for quality scoring.
Failure Modes and Resilience
Proactively discuss failures
Let me walk through failure scenarios. This is critical for a real-time system where failures directly impact user experience.
| Failure | Impact | Detection | Mitigation |
|---|---|---|---|
| Redis (geo-index) down | Cannot match rides | Health check fails | Failover to replica, queue requests |
| Driver app loses connection | Stale location data | No updates for 30s | Mark driver offline, remove from pool |
Handling Driver Disconnection
class DriverLivenessMonitor:
"""
Monitors driver connections and removes stale drivers from matching pool.Graceful Degradation Under Load
class MatchingServiceWithDegradation:
"""
Implements graceful degradation when system is overloaded.Real-world insight
During New Year Eve peaks, Uber switches to simpler matching algorithms and increases surge pricing to balance supply/demand. Quality degrades gracefully rather than failing completely.
Evolution and Scaling
What to say
This design handles a single city well. Let me discuss how it evolves for multi-city, multi-product (rides + delivery), and extreme scale scenarios.
Evolution Path:
Stage 1: Single City (our design) - One Redis cluster for geo-index - Matching service can scale horizontally - Works for cities up to 100K concurrent drivers
Stage 2: Multi-City - Shard by city - each city has independent matching - Shared driver/rider accounts across cities - Cross-city trips handled as special case
Stage 3: Multi-Product (Rides + Eats) - Separate matching pools but shared location service - Drivers can be eligible for both - Eats adds order batching optimization
Multi-City Architecture
Advanced Features to Mention:
- 1.Scheduled Rides: Pre-match drivers for future pickups, hold assignment until closer to time
- 2.Ride Pooling (UberPool): Match multiple riders going same direction to one driver - combinatorial optimization
- 3.Delivery Batching: For Uber Eats, batch multiple orders to one driver - traveling salesman variant
- 4.Predictive Positioning: Use ML to predict demand and pre-position drivers in high-demand areas
- 5.Dynamic Pricing Integration: Surge pricing affects driver behavior and matching priority
Alternative approach
If we needed sub-100ms matching for autonomous vehicles or high-frequency trading style matching, I would use a custom in-memory data structure with lock-free algorithms instead of Redis. The current design optimizes for operational simplicity at the cost of some latency.
What I would add for Uber Eats specifically:
- Order ready time prediction: Match driver arrival with food ready time - Multi-pickup optimization: One driver picks up from multiple restaurants - Hot bag capacity: Consider how many orders fit in delivery bag - Restaurant prep time variance: Factor in which restaurants are consistently late