System Design Masterclass

58 items

System Design Masterclass

Ride Sharingmatchinggeolocationreal-timeoptimizationdispatchadvanced

Design Rider Matching System

Design rider matching for Uber/Uber Eats with millions of matches per day

Millions of matches/day, 100K+ concurrent drivers|Similar to Uber, Lyft, DoorDash, Instacart, Grab|45 min read

Summary

Rider matching connects riders with nearby drivers in real-time. The core challenges are maintaining accurate driver locations, optimizing match quality vs speed, handling supply-demand imbalances, and ensuring fairness. This problem tests geo-spatial indexing, real-time systems, and optimization algorithms.

Key Takeaways

Core Problem

This is a real-time bipartite matching problem with geo-spatial constraints. We match riders to drivers while optimizing for multiple objectives.

The Hard Part

Balancing match speed (riders hate waiting) with match quality (minimize driver detour, maximize utilization) while handling rapidly changing driver locations.

Scaling Axis

Scale by geographic region (city/zone). Each city can run independent matching with cross-city coordination only for long trips.

The Question: Design a system that matches riders requesting trips with nearby available drivers, handling millions of matches per day.

The matching system is the core of any ride-sharing platform: - Real-time: Riders expect a driver within seconds - Location-aware: Match with nearby drivers to minimize pickup time - Optimized: Balance rider wait time, driver utilization, and fairness - Reliable: Handle driver location updates, network issues, and edge cases

What to say first

Before designing, let me clarify the scope. Are we focusing on ride-sharing (Uber) or delivery (Uber Eats)? The matching logic differs - delivery has pickup and dropoff locations for the order, not the customer.

Hidden requirements interviewers test: - Can you handle real-time geo-spatial queries at scale? - Do you understand the matching optimization tradeoffs? - How do you handle the cold start and supply-demand imbalance? - Can you design for failure modes (driver app crashes, GPS issues)?

Summary

Key Takeaways

Core Problem

This is a real-time bipartite matching problem with geo-spatial constraints. We match riders to drivers while optimizing for multiple objectives.

The Hard Part

Balancing match speed (riders hate waiting) with match quality (minimize driver detour, maximize utilization) while handling rapidly changing driver locations.

Scaling Axis

Scale by geographic region (city/zone). Each city can run independent matching with cross-city coordination only for long trips.

Critical Invariant

A driver must never be double-booked. Once assigned to a rider, that driver is unavailable until trip completes or is cancelled.

Performance Requirement

Match must happen within 3-5 seconds of request. Driver location updates must be processed within 1 second.

Key Tradeoff

Greedy matching (fastest) vs batch matching (optimal). We use time-windowed batching to balance speed and quality.

Design Walkthrough

Problem Statement

The Question: Design a system that matches riders requesting trips with nearby available drivers, handling millions of matches per day.

What to say first

Clarifying Questions

Ask these questions to scope the problem correctly.

Question 1: Scale

How many concurrent drivers and ride requests? What is the geographic scope?

Why this matters: Determines if single-city or multi-region architecture. Typical answer: 100K concurrent drivers per major city, 10K requests per minute at peak. Architecture impact: Need geo-sharding, each city can be semi-independent.

Question 2: Matching Goals

What are we optimizing for? Fastest match, shortest pickup, driver fairness, or overall efficiency?

Why this matters: Different objectives lead to different algorithms. Typical answer: Minimize rider wait time as primary, with fairness constraints for drivers. Architecture impact: Greedy matching with fairness adjustments vs global optimization.

Question 3: Driver Types

Single driver type or multiple (UberX, UberXL, Black)? Can one driver serve multiple types?

Why this matters: Affects matching pool and constraints. Typical answer: Multiple types, drivers can be eligible for several. Architecture impact: Need driver capability filtering before matching.

Question 4: Ride-sharing vs Delivery

Is this for ride-sharing (passenger in car) or delivery (food/packages)?

Why this matters: Delivery allows batching multiple orders; ride-sharing typically does not. Typical answer: Focus on ride-sharing, mention delivery as extension. Architecture impact: Delivery adds order batching optimization layer.

Stating assumptions

I will assume: ride-sharing focus, 100K concurrent drivers per city, optimize for rider wait time with driver fairness, multiple vehicle types, and real-time matching within 5 seconds.

The Hard Part

Say this out loud

The hard part is balancing match speed with match quality while handling rapidly changing driver locations. We need sub-second location updates and sub-5-second matching.

Why this is genuinely hard:

1.Real-time Location Tracking: 100K drivers sending GPS updates every 3-5 seconds = 20-30K updates/second per city. Must index for fast geo-queries.
2.Matching Optimization: Finding the globally optimal driver-rider assignment is NP-hard. We need approximation algorithms that are fast enough for real-time.
3.Race Conditions: Two riders request at the same time, both see the same nearby driver. How do we prevent double-booking?
4.Stale Data: Driver location from 5 seconds ago may be outdated. Driver may have moved or gone offline.
5.Supply-Demand Imbalance: During rush hour, more riders than drivers. How do we queue fairly? During slow times, how do we distribute rides fairly among drivers?

Common mistake

Candidates often propose simple nearest-driver matching without considering: What if that driver is about to complete a trip? What about driver fairness (some drivers get all rides)? What about matching quality vs speed tradeoff?

The fundamental tradeoffs:

Speed vs Quality: Instant greedy match vs waiting to batch and optimize - Rider vs Driver: Minimize rider wait vs fair distribution to drivers - Local vs Global: Best match for this rider vs best overall system utilization - Accuracy vs Freshness: Precise ETA calculations vs using cached/stale data

Scale and Access Patterns

Let me estimate the scale for a major city like San Francisco or New York.

Dimension	Value	Impact
Concurrent drivers (city)	100,000	Need efficient geo-index
Location updates/second	25,000	High write throughput for location store

+ 4 more rows...

What to say

At 25K location updates per second and 167 matches per second per city, we need a geo-spatial index optimized for both high write throughput and fast proximity queries.

Access Pattern Analysis:

Write-heavy for locations: 25K writes/sec vs hundreds of reads for matching - Read pattern: Find all drivers within X km of a point, sorted by distance/ETA - Highly localized: 95% of queries are within a single city - Time-sensitive: Data older than 30 seconds is nearly useless - Bursty: Rush hour has 5-10x normal traffic

Location storage per driver:
- driver_id: 8 bytes
- latitude: 8 bytes

+ 16 more lines...

High-Level Architecture

Let me design the architecture starting with core components.

What to say

I will separate the system into three main services: Location Service (tracking), Matching Service (dispatch), and Trip Service (state management). They communicate via async events with synchronous calls only for critical paths.

Rider Matching Architecture

Component Responsibilities:

1. Location Service - Receives GPS updates from driver apps via WebSocket - Updates geo-spatial index in real-time - Publishes location events to Kafka for analytics - Maintains driver online/offline status

2. Matching Service - Receives ride requests from riders - Queries geo-index for nearby available drivers - Runs matching algorithm to select best driver - Sends dispatch request to selected driver - Handles driver accept/reject responses

3. Trip Service - Manages trip lifecycle (requested, matched, pickup, in_progress, completed) - Handles cancellations and reassignment - Stores trip history for billing and analytics

4. Geo Index - Stores current driver locations with status - Supports fast proximity queries (drivers within X km) - High write throughput for location updates - Options: Redis with geospatial, custom quad-tree, H3/S2 cells

Real-world reference

Uber uses a service called Ringpop for consistent hashing and DISCO for dispatch. Lyft uses a custom geo-index based on S2 cells. Both separate location tracking from matching logic.

Data Model and Storage

The data model needs to support real-time location queries and trip state management.

What to say

I will use Redis with geospatial commands for the hot path (driver locations), and PostgreSQL for durable trip state. The geo-index is the source of truth for driver availability.

-- Driver location (in Redis, shown as logical structure)
-- Key: driver:{driver_id}:location
-- GEOADD drivers:{city_id} longitude latitude driver_id

+ 29 more lines...

Geo-Spatial Indexing Options:

Approach	Pros	Cons	Best For
Redis GEOADD	Simple, fast, built-in	Single node limit, basic queries	Moderate scale, simple needs
PostGIS	Rich queries, ACID	Slower updates, not real-time	Analytics, historical queries
S2/H3 Cells	Hierarchical, shardable	More complex, custom code	Large scale, multi-region
Custom Quad-tree	Optimized for use case	Build and maintain yourself	Extreme performance needs

import redis

class DriverLocationStore:

+ 53 more lines...

Critical detail

The set_driver_unavailable operation MUST be atomic. If two matching requests try to assign the same driver, only one should succeed. Use Redis Lua scripts or distributed locks.

Matching Algorithm Deep Dive

The matching algorithm is the core intellectual property of ride-sharing companies. Let me walk through the approaches.

Algorithm	Speed	Quality	Complexity	Use Case
Nearest Available	Instant	Poor	Simple	Low traffic, simple MVP
Greedy with Scoring	Fast (ms)	Good	Medium	Most production systems
Batch Matching	Slower (seconds)	Optimal	Complex	High-density areas
Auction-based	Medium	Fair	Complex	Driver fairness focus

Approach 1: Greedy with Scoring (Recommended)

For each ride request, score all nearby drivers and pick the best.

from dataclasses import dataclass
from typing import List, Optional
import heapq

+ 102 more lines...

Approach 2: Batch Matching (High-Density Optimization)

Instead of matching instantly, collect requests for a short window (1-2 seconds) and find the globally optimal assignment.

Batch Matching Flow

from scipy.optimize import linear_sum_assignment
import numpy as np

+ 42 more lines...

When to use batch matching

Batch matching improves overall efficiency by 10-15% but adds 1-2 seconds latency. Use it in high-density areas (airports, downtown) where many requests and drivers exist simultaneously.

Consistency and Invariants

System Invariants

1. A driver must NEVER be assigned to two rides simultaneously. 2. A rider must not be charged if no driver is assigned. 3. Driver location data must never be more than 30 seconds stale for matching.

Preventing Double-Booking

This is the critical invariant. Multiple matching requests might try to assign the same driver.

# Option 1: Redis Lua Script (recommended for speed)
ASSIGN_DRIVER_SCRIPT = """
local driver_key = KEYS[1]

+ 56 more lines...

Handling Driver Response Timeout

After dispatch, driver has limited time to accept/reject. If no response, reassign.

Driver Response State Machine

Critical timeout handling

If driver does not respond within 15 seconds, automatically release the driver and re-enter the ride into matching. Track timeout rate per driver for quality scoring.

Failure Modes and Resilience

Proactively discuss failures

Let me walk through failure scenarios. This is critical for a real-time system where failures directly impact user experience.

Failure	Impact	Detection	Mitigation
Redis (geo-index) down	Cannot match rides	Health check fails	Failover to replica, queue requests
Driver app loses connection	Stale location data	No updates for 30s	Mark driver offline, remove from pool

+ 4 more rows...

Handling Driver Disconnection

class DriverLivenessMonitor:
    """
    Monitors driver connections and removes stale drivers from matching pool.

+ 43 more lines...

Graceful Degradation Under Load

class MatchingServiceWithDegradation:
    """
    Implements graceful degradation when system is overloaded.

+ 41 more lines...

Real-world insight

During New Year Eve peaks, Uber switches to simpler matching algorithms and increases surge pricing to balance supply/demand. Quality degrades gracefully rather than failing completely.

Evolution and Scaling

What to say

This design handles a single city well. Let me discuss how it evolves for multi-city, multi-product (rides + delivery), and extreme scale scenarios.

Evolution Path:

Stage 1: Single City (our design) - One Redis cluster for geo-index - Matching service can scale horizontally - Works for cities up to 100K concurrent drivers

Stage 2: Multi-City - Shard by city - each city has independent matching - Shared driver/rider accounts across cities - Cross-city trips handled as special case

Stage 3: Multi-Product (Rides + Eats) - Separate matching pools but shared location service - Drivers can be eligible for both - Eats adds order batching optimization

Multi-City Architecture

Advanced Features to Mention:

1.Scheduled Rides: Pre-match drivers for future pickups, hold assignment until closer to time
2.Ride Pooling (UberPool): Match multiple riders going same direction to one driver - combinatorial optimization
3.Delivery Batching: For Uber Eats, batch multiple orders to one driver - traveling salesman variant
4.Predictive Positioning: Use ML to predict demand and pre-position drivers in high-demand areas
5.Dynamic Pricing Integration: Surge pricing affects driver behavior and matching priority

Alternative approach

If we needed sub-100ms matching for autonomous vehicles or high-frequency trading style matching, I would use a custom in-memory data structure with lock-free algorithms instead of Redis. The current design optimizes for operational simplicity at the cost of some latency.

What I would add for Uber Eats specifically:

Order ready time prediction: Match driver arrival with food ready time - Multi-pickup optimization: One driver picks up from multiple restaurants - Hot bag capacity: Consider how many orders fit in delivery bag - Restaurant prep time variance: Factor in which restaurants are consistently late

Design Trade-offs

Advantages

+Instant matching
+Simple to implement
+Low latency

Disadvantages

-Suboptimal globally
-No fairness guarantees
-Can miss better matches arriving soon

When to use

Low traffic areas, MVP, fallback during high load

System Design Masterclass

Weather Application with Forecasting

URL Shortener

Live Comments Feature

API Rate Limiter

On-Call Escalation System

Hotel Booking and Reservation System

Parts Compatibility Validation

Real-time Stock Price Viewer

Top-K Rankings System

File Download and Sync Library

Real-time Active Viewers

Marketplace Features

Price Alert System

Netflix Screen Concurrency Limits

Live Reactions System

Top K Most Shared Articles

High-Profile Likes Counter

Authentication and User Login

Google Calendar

Web Crawler

News Feed

Video Streaming Platform

IoC / Dependency Injection Framework

Distributed Control Infrastructure

Notification Service

Distributed Tracing System

P2P File Transfer System

Large Data Migration to Cloud

Wire Transfer API

Large Data Sorting and Processing

Database Control Plane

Distributed Metrics Logging and Aggregation

Ads Management & Delivery System

Flash Sale Backend

Photo Sharing Platform

Cluster Health Monitoring System

Rider Matching System

Surge Pricing System

Collaborative Editing System

Server Metrics Collection System

User Analytics Dashboard & Event Pipeline

Dropbox / Google Drive

Distributed Message Queue

ETA and Live Location Sharing

Distributed Key-Value Store

Distributed Stream Processing System

Payment Processing System

Distributed Job Scheduler

WhatsApp / Messenger

Payment Wallet at Global Scale

Uber / Ride Sharing

Web Search Engine

Globally Distributed SQL Database

Real-Time Analytics System

Recommendation System (Netflix)

Multi-Region Disaster Recovery System

Time-Series Database

Fraud Detection System

Design Rider Matching System

Summary

Key Takeaways

Core Problem

The Hard Part

Scaling Axis

Problem Statement

What to say first

Premium Content