System Design Masterclass

58 items

System Design Masterclass

real-timelocation-trackingmatchingpaymentsgeospatialadvanced

Design Uber / Ride Sharing

Design a system that matches riders with drivers in real-time

Millions of concurrent trips|Similar to Uber, Lyft, Ola, Grab, DiDi, Bolt|45 min read

Summary

A ride sharing app connects people who need rides with drivers nearby. When you open the app, it shows drivers around you on a map. You tap to request a ride, and the system finds the best driver for you in seconds. The hard parts are: tracking millions of drivers moving around in real-time, matching riders with the closest available driver very fast, changing prices when lots of people want rides at the same time (surge pricing), handling payments after the trip, and making sure the whole system stays fast even when millions of people use it at rush hour. Companies like Uber, Lyft, Ola, and Grab ask this question in interviews.

Key Takeaways

Core Problem

The main job is to quickly find the best available driver near a rider. We need to track where millions of drivers are RIGHT NOW and match them with riders in under 3 seconds.

The Hard Part

Drivers move constantly. Their location changes every few seconds. We need a special database that is very good at answering: Which drivers are within 2 miles of this spot RIGHT NOW?

Scaling Axis

We can split the world into areas (like cities). Each city can have its own servers. A driver in New York never gets matched with a rider in Los Angeles.

Critical Invariant

One driver can only have one active trip at a time. We must never assign the same driver to two riders. If we do, both riders will be angry.

Performance Requirement

When a rider requests a trip, they should see a matched driver within 3 seconds. The app should show driver location updates every 2-4 seconds during the trip.

Key Tradeoff

We want to match riders with the CLOSEST driver. But checking every single driver in a city is too slow. So we use special location indexes to quickly find nearby drivers, even if we might miss a slightly closer one.

Design Walkthrough

Problem Statement

The Question: Design a ride sharing system like Uber where riders can request trips, drivers can accept them, and the system tracks the whole journey with payments.

What the app needs to do (most important first):

1.Show nearby drivers - When a rider opens the app, show drivers on a map near their location.
2.Request a ride - Rider enters where they want to go, sees the price, and taps to request. The system finds a nearby driver.
3.Match rider with driver - Find the best available driver (closest, highest rated, right car type) and send them the request.
4.Track the trip - Show rider where the driver is in real-time. Show driver where to go. Track the whole journey.
5.Handle payments - Calculate fare based on distance and time. Charge rider's card. Pay the driver later.
6.Surge pricing - When many people want rides but few drivers are around, increase prices to attract more drivers.
7.Ratings and reviews - After the trip, both rider and driver can rate each other.

What to say first

Let me understand what features we need. Do we need multiple car types (economy, premium, XL)? Do we need carpooling where multiple riders share a car? What about scheduled rides for later? Once I know the features, I will ask about scale - how many cities, drivers, and trips per day.

What the interviewer really wants to see: - Can you track millions of moving drivers and find nearby ones quickly? - How do you make sure one driver is never assigned to two trips at once? - Can you handle rush hour when everyone wants a ride at the same time? - How do you calculate the fare and handle payment failures?

Clarifying Questions

Before you start designing, ask questions to understand what you are building. Good questions show the interviewer you think before you code.

Question 1: How big is this?

How many cities do we operate in? How many drivers and riders? How many trips happen per day? What is the busiest time - how many trips per second at peak?

Why ask this: A small city with 1,000 drivers needs a very different design than a global system with 1 million drivers.

What interviewers usually say: 100 cities, 1 million active drivers, 10 million trips per day. At rush hour in big cities, maybe 1,000 trip requests per second.

How this changes your design: With 1,000 requests per second, we need multiple servers, good caching, and fast location lookups. We can split by city since drivers in one city never serve riders in another.

Question 2: How fast should matching be?

When a rider requests a trip, how quickly should they see a matched driver? Is 3 seconds okay, or does it need to be under 1 second?

Why ask this: Faster matching needs more powerful servers and smarter algorithms.

What interviewers usually say: Under 3 seconds for most requests. Under 10 seconds even during rush hour.

How this changes your design: We need to pre-index driver locations so we can quickly find nearby drivers without scanning everyone.

Question 3: What car types do we need?

Just one car type, or multiple like economy, premium, XL (big cars), and pool (shared rides)?

Why ask this: Pool rides are much more complex - you need to match multiple riders going in the same direction.

What interviewers usually say: Start with single car types (economy, premium, XL). Pool is a follow-up if we have time.

How this changes your design: For now, each trip has one rider and one driver. Pool would need route optimization.

Question 4: How do drivers get paid?

Do we pay drivers immediately after each trip, or do we batch payments weekly? Do we need to handle cash payments too?

Why ask this: Immediate payments need real-time systems. Weekly payments are simpler.

What interviewers usually say: Charge rider immediately. Pay drivers weekly. Some markets need cash payment support.

How this changes your design: We process rider payments in real-time but can batch driver payouts.

Summarize your assumptions

Let me summarize: 100 cities, 1 million drivers, 10 million trips per day, under 3 second matching, multiple car types but no pooling for now, charge riders immediately and pay drivers weekly. I will design for this scale.

The Hard Part

Say this to the interviewer

The hardest part of a ride sharing system is finding nearby drivers FAST. We have 1 million drivers moving around. Each driver sends their location every 4 seconds. When a rider requests a trip, we need to find the closest available drivers within 3 seconds.

Why finding nearby drivers is tricky (explained simply):

1.Drivers never stop moving - A driver's location changes every few seconds. By the time you find them, they might have moved.
2.Millions of updates per second - If 1 million drivers send location every 4 seconds, that is 250,000 location updates per second to process.
3.Distance calculations are slow - Checking the distance from a rider to EVERY driver in a city would take too long.
4.Closest is not always best - The closest driver might be stuck in traffic, have low ratings, or be about to go offline.
5.Race conditions - Two riders nearby might both want the same driver. We must make sure only one gets them.

Common mistake candidates make

Many people say: just query the database for all drivers within 5 miles. This is too slow! With millions of drivers, scanning them all takes seconds. Instead, we use special data structures that organize drivers by location so we can find nearby ones instantly.

The solution: Divide the map into small squares (cells)

Imagine putting a grid over the city map. Each small square (cell) is about 1 mile by 1 mile. Instead of searching all drivers, we only look at drivers in nearby cells.

When a rider at location (X, Y) requests a ride: 1. Figure out which cell they are in 2. Look at that cell plus the 8 cells around it (like a tic-tac-toe board) 3. Only check drivers in those 9 cells

This is MUCH faster than checking every driver in the city.

Finding nearby drivers using grid cells

Two ways to organize drivers by location:

Option 1: Geohash (Recommended for simplicity) - Turn any location into a short code like "9q8yy" - Nearby locations have similar codes - Store in Redis as key-value: geohash → list of driver IDs - Very fast lookups, easy to understand

Option 2: Quadtree or R-tree - Special tree structure that divides space into smaller and smaller boxes - Used by databases like PostGIS - Better for complex queries but harder to implement

For interviews, Geohash is usually enough. Mention Quadtree to show you know alternatives.

Scale and Access Patterns

Before designing, let me figure out how big this system needs to be. This helps us choose the right tools.

What we are measuring	Number	What this means for our design
Active drivers at peak	500,000	Half a million drivers online at rush hour
Location updates per second	125,000	Drivers send location every 4 seconds = 500K / 4

+ 4 more rows...

What to tell the interviewer

With 125,000 location updates per second and 1,000 trip requests per second, we need a distributed system. The good news is we can split by city - drivers in New York never interact with riders in Los Angeles. Each city can have its own set of servers.

How people use the app (from most common to least):

1.Driver sends location - Every 4 seconds when online. This is the highest volume.
2.Rider opens app - Show nearby drivers on map. Very common.
3.Rider requests trip - Enter destination, see price, tap to request.
4.Get trip updates - During trip, rider sees driver moving on map.
5.Complete trip - Calculate fare, charge payment, update ratings.

Location updates:
- 500,000 active drivers at peak
- Each sends location every 4 seconds

+ 19 more lines...

Common interview mistake

Do not try to store all location history in a regular database - 2 TB per day is too much. Location data is temporary - we only need to know where drivers are RIGHT NOW. Use Redis or an in-memory store for current locations. Only save trip paths for completed trips.

High-Level Architecture

Now let me draw the big picture of how all the pieces fit together. I will keep it simple and explain what each part does.

What to tell the interviewer

I will split this into separate services: one for tracking driver locations, one for matching riders with drivers, one for managing the trip lifecycle, one for pricing, and one for payments. Each service does one job well. We can scale each one independently.

Ride Sharing System - The Big Picture

What each service does and WHY it is separate:

Service	What it does	Why it is separate
Location Service	Receives driver locations. Stores in Redis. Answers: who is near this point?	Gets 125K updates/second - needs to be super fast. Uses Redis, not the main database. If it slows down, trips still work.
Matching Service	When rider requests trip, finds best available driver nearby.	Complex logic: consider distance, ratings, car type, driver preferences. Can be slow without affecting location updates.
Trip Service	Manages the whole trip from request to payment. Tracks status.	This is the source of truth for trips. Handles state machine (requested → accepted → started → completed).
Pricing Service	Calculates fare based on distance, time, surge. Shows price before booking.	Pricing logic changes often (promotions, surge rules). Separate service means we can update pricing without touching trips.
Payment Service	Charges riders, handles refunds, pays drivers weekly.	Payment is sensitive. Separate service with extra security. Can retry failed payments without affecting trips.

Why WebSocket for live updates?

During a trip, the rider needs to see the driver moving on the map. We could make the app ask every 2 seconds (polling), but that wastes battery and bandwidth. Instead, we keep a WebSocket connection open - the server pushes updates only when the driver moves. More efficient!

Technology Choices - Why we picked these tools:

Redis for driver locations (Recommended) - Why: Super fast, supports geospatial queries (GEOADD, GEORADIUS) - Can handle 100K+ writes per second - Data is temporary anyway - if Redis crashes, drivers just send location again

PostgreSQL for trips and users - Why: Trips need transactions (ACID). User accounts need reliability. - Great for complex queries (find all trips this week, calculate driver earnings)

Kafka for events - Why: When a trip status changes, many things need to know (analytics, notifications, payments) - Kafka lets us publish once, many services consume - Can replay events if something goes wrong

WebSocket for real-time updates - Why: Push updates to rider app without constant polling - Much better battery life on phones

Data Model and Storage

Now let me show how we organize the data. Different types of data go in different places based on how we use them.

What to tell the interviewer

I split data by how it is used: driver locations go in Redis (fast, temporary), trip data goes in PostgreSQL (reliable, queryable), and events go through Kafka (for real-time updates to many services).

Redis: Driver Locations (fast, temporary)

We use Redis GEOADD to store driver locations. This lets us quickly find drivers near a point.

STORING A DRIVER LOCATION:
    When driver sends their location:
    GEOADD drivers:nyc <longitude> <latitude> driver_123

+ 16 more lines...

PostgreSQL: Users, Drivers, Trips (reliable, permanent)

These tables store data that must never be lost.

Table	Key columns	Why it matters
users	id, name, email, phone, payment_method_id, rating	Rider accounts. Payment method links to Stripe.
drivers	id, name, phone, license, car_type, rating, status, city	Driver accounts. Status: online, offline, on_trip.
trips	id, rider_id, driver_id, status, pickup, dropoff, fare, distance, duration	Every trip ever taken. Source of truth for billing.
payments	id, trip_id, amount, status, stripe_charge_id	Payment records. Links to Stripe for refunds.
ratings	id, trip_id, from_user_id, to_user_id, score, comment	Ratings after each trip.

The Trips Table - Most Important

This table tracks every trip from start to finish.

Column	What it stores	Example
id	Unique trip ID	trip_abc123
rider_id	Who requested the ride	user_456

+ 16 more rows...

Why save estimated AND actual fare?

We show the rider an estimated fare before they book. But the actual fare might be different (traffic made it longer, rider changed destination). We save both so we can explain any difference and handle disputes.

Trip Status Flow

Driver Matching Deep Dive

Let me explain step by step how we find the best driver when a rider requests a trip.

The matching problem is harder than it looks

We cannot just pick the closest driver. What if they have low ratings? What if they are about to end their shift? What if their car type does not match? What if another rider grabbed them 1 second ago? We need to handle all these cases.

Step by step: How matching works

When rider taps "Request Ride", here is what happens:

1.Get rider's location and preferences - Where are they? What car type? Any special needs?
2.Find nearby available drivers - Query Redis for drivers within 5 miles who are online and not on a trip.
3.Filter and rank drivers - Remove drivers with wrong car type. Sort by a score that considers distance, rating, acceptance rate.
4.Try to reserve the best driver - Send request to #1 driver. If they don't respond in 15 seconds, try #2.
5.Handle the race condition - Use a lock so two riders cannot grab the same driver at the same instant.

FUNCTION match_rider_with_driver(rider, pickup_location, car_type):
    
    STEP 1: Find nearby available drivers

+ 53 more lines...

The race condition problem

Imagine Alice and Bob both request rides at the same second. Both see driver Charlie as closest. Without protection, both systems might assign Charlie to both trips! We use a Redis lock to make sure only ONE succeeds. The other retries with the next driver.

What if no driver is nearby?

1.Expand the search radius - Try 5 miles, then 7 miles, then 10 miles.
2.Show "finding driver" to rider - Don't make them wait silently.
3.After 60 seconds, give up - Tell rider "No drivers available. Try again later or walk."
4.Consider surge pricing - If no drivers, raise prices. Higher prices bring more drivers online.

Driver accepts or declines

When we send a trip request to a driver, they have 15 seconds to accept. If they decline or don't respond:

1.Release the lock on that driver 2. Mark them as "available" again 3. Try the next best driver 4. Update rider: "Finding another driver..."

We track acceptance rate. Drivers who decline too often get fewer requests.

Trip Lifecycle Deep Dive

Let me walk through everything that happens from when a rider opens the app to when the trip ends.

Phase 1: Before the request

1.Rider opens app → App sends their location → We show nearby drivers on map 2. Rider enters destination → We calculate route and show estimated fare 3. Rider sees the price ($15.50) and taps "Request Ride"

FUNCTION calculate_fare(pickup, dropoff, car_type):
    
    // Call Maps API to get route

+ 32 more lines...

Phase 2: Matching (0-30 seconds)

1.Create trip record with status = "requested" 2. Find and reserve a driver (see matching section) 3. Send push notification to driver 4. Driver has 15 seconds to accept 5. If accepted → status = "accepted", notify rider 6. If declined → try next driver

Phase 3: Driver on the way (2-10 minutes)

1.Driver starts driving to pickup location 2. Driver app sends location every 4 seconds 3. We push updates to rider via WebSocket 4. Rider sees driver moving on map + ETA 5. When driver is 1 minute away, notify rider: "Driver almost there!" 6. When driver arrives → status = "driver_arrived", notify rider

Phase 4: Trip in progress (5-60 minutes)

1.Rider gets in car, driver taps "Start Trip" → status = "in_progress" 2. We start tracking the actual route 3. Driver app sends location every 4 seconds 4. Rider sees progress on map 5. We record the path for fare calculation

Phase 5: Trip complete (final)

1.Rider arrives, driver taps "End Trip" → status = "completed" 2. Calculate actual fare based on real distance and time 3. Charge rider's payment method 4. Show fare breakdown to both rider and driver 5. Prompt both to rate each other 6. Update driver status back to "available"

FUNCTION complete_trip(trip_id):
    
    trip = GET trip from database

+ 45 more lines...

Why publish events to Kafka?

When a trip completes, many services need to know: analytics wants to record it, the rating service wants to prompt for ratings, fraud detection wants to check for anomalies, driver earnings need updating. Instead of the trip service calling each one, we publish ONE event and all interested services consume it.

Surge Pricing Deep Dive

What is surge pricing?

When lots of people want rides but few drivers are available, we increase prices. This does two things: (1) Some riders decide to wait, reducing demand. (2) More drivers come online because they can earn more. Surge balances supply and demand.

How surge pricing works:

1.Divide the city into small zones (like neighborhoods) 2. For each zone, count: riders requesting trips vs available drivers 3. If demand is much higher than supply, increase the multiplier 4. Show the surge multiplier to riders BEFORE they book 5. Recalculate every few minutes as conditions change

FUNCTION calculate_surge_for_zone(zone_id):
    
    // Count recent trip requests in this zone (last 5 minutes)

+ 40 more lines...

Important surge pricing rules:

1.Lock the surge at request time - If surge is 2x when you tap request, you pay 2x even if surge drops to 1x by the time the trip ends.
2.Show surge clearly - Rider MUST see and confirm the higher price before booking. No surprises.
3.Cap the maximum - Most companies cap surge at 2x-3x to avoid bad press ("$200 for a 10 minute ride!").
4.Smooth the changes - Don't jump from 1x to 2x instantly. Gradually increase to avoid confusing riders.

Why lock surge at request time?

Imagine: surge is 2x, rider sees $30 and accepts. During the trip, surge drops to 1x. Without locking, rider might be charged only $15 - they would be happy, but driver expected to earn more. Or worse: surge could INCREASE and rider gets a shock. Locking makes it fair for everyone.

Payment Processing

Payment is one of the most critical parts. If we charge the wrong amount or lose payment data, we lose money and trust.

Payment flow for riders:

1.Save payment method - When rider signs up, they add a credit card. We send card details to Stripe, get back a token. We NEVER store actual card numbers.
2.Authorize before trip - When rider requests trip, we ask Stripe: "Can this card pay $20?" Stripe holds that amount but doesn't charge yet.
3.Capture after trip - When trip ends, we tell Stripe: "Charge $17.25" (the actual amount). Stripe charges the card.
4.Handle failures - If charge fails, rider still completed the trip. We retry, notify rider to update payment method.

WHEN RIDER ADDS PAYMENT METHOD:
    // Send card to Stripe, get token back
    stripe_customer = STRIPE.create_customer(rider.email)

+ 38 more lines...

Payment flow for drivers:

Drivers get paid weekly, not per trip. This is simpler and cheaper (fewer transactions).

1.Track earnings per trip - When trip completes, calculate driver's share (usually 75-80% of fare).
2.Weekly payout job - Every Monday, sum up each driver's earnings for the week.
3.Deduct fees - Subtract Uber's commission (20-25%), any other fees.
4.Transfer to bank - Send money to driver's bank account via ACH or similar.

Why authorize before the trip?

Imagine: rider takes a $50 trip, but their card is maxed out. Without pre-authorization, we complete the trip and THEN discover we cannot charge them. We lose money. By authorizing first, we know the card is good before the driver starts driving.

What Can Go Wrong and How We Handle It

Tell the interviewer about failures

Good engineers think about what can break. Let me walk through common failures and how we protect against them.

What breaks	What happens	How we handle it
Redis (location store) goes down	Cannot find nearby drivers	Use Redis cluster with replicas. If all down, fall back to PostgreSQL (slower but works).
Driver app loses network	Location updates stop	If no update for 30 seconds, mark driver as offline. They come back online when network returns.

+ 4 more rows...

Handling driver going offline mid-trip:

What if a driver's phone dies during a trip?

1.If no location update for 60 seconds during trip, alert our support team 2. Try to contact driver via backup phone number 3. If unreachable, contact rider to check if they are okay 4. If trip seems abandoned, help rider get another ride 5. Use last known location to calculate partial fare

BACKGROUND JOB: check_stuck_trips (runs every minute)
    
    // Find trips that seem stuck

+ 30 more lines...

The importance of idempotency

Network can fail at any moment. If rider's app sends "complete trip" but the response is lost, the app might retry. We must handle this! Use idempotency keys: if we get the same "complete trip" request twice, only process it once.

Growing the System Over Time

What to tell the interviewer

This design works well for one country with millions of users. Let me explain how we would grow to support worldwide operations.

Stage 1: Single city (starting out) - All services in one data center - Single Redis, single PostgreSQL - Can handle ~10,000 drivers, ~50,000 trips/day

Stage 2: Multiple cities in one country - Shard by city - each city has its own Redis for locations - Shared PostgreSQL for user accounts, payments - Drivers in NYC never interact with SF, so this works well

Stage 3: Multiple countries - Each country gets its own deployment - User data stays in that country (for legal reasons) - Global services for things like fraud detection

Multi-region architecture

Features to add later:

1. Carpooling (Pool rides) - Multiple riders share one car going in same direction - Much more complex matching (need to calculate detours) - Dynamic pricing based on how much detour each rider adds

2. Scheduled rides - Book a ride for tomorrow 6am to airport - Need to reserve a driver ahead of time - Guarantee the ride even during surge

3. Driver incentives - "Complete 10 trips today, get $20 bonus" - Track progress in real-time - Prevents gaming (fake short trips)

4. Fraud detection - Detect fake GPS locations - Find drivers and riders colluding for fake trips - Unusual patterns (same rider-driver pair repeatedly)

How real companies do it

Uber uses a custom spatial index called H3 (hexagonal grid) instead of simple geohash. They process over 1 million location updates per second globally. Their matching algorithm considers not just distance but also traffic, driver earnings goals, and predicted demand. This level of sophistication is for when you have billions of dollars and millions of engineers - start simple!

Design Trade-offs

Advantages

+Simple to set up - just add PostGIS extension
+Powerful geospatial queries built in
+All data in one place

Disadvantages

-Too slow for 125K updates per second
-Database becomes bottleneck
-Disk writes are slow for real-time data

When to use

Only for small systems with under 1,000 drivers. Not for production at scale.

System Design Masterclass

Weather Application with Forecasting

URL Shortener

Live Comments Feature

API Rate Limiter

On-Call Escalation System

Hotel Booking and Reservation System

Parts Compatibility Validation

Real-time Stock Price Viewer

Top-K Rankings System

File Download and Sync Library

Real-time Active Viewers

Marketplace Features

Price Alert System

Netflix Screen Concurrency Limits

Live Reactions System

Top K Most Shared Articles

High-Profile Likes Counter

Authentication and User Login

Google Calendar

Web Crawler

News Feed

Video Streaming Platform

IoC / Dependency Injection Framework

Distributed Control Infrastructure

Notification Service

Distributed Tracing System

P2P File Transfer System

Large Data Migration to Cloud

Wire Transfer API

Large Data Sorting and Processing

Database Control Plane

Distributed Metrics Logging and Aggregation

Ads Management & Delivery System

Flash Sale Backend

Photo Sharing Platform

Cluster Health Monitoring System

Rider Matching System

Surge Pricing System

Collaborative Editing System

Server Metrics Collection System

User Analytics Dashboard & Event Pipeline

Dropbox / Google Drive

Distributed Message Queue

ETA and Live Location Sharing

Distributed Key-Value Store

Distributed Stream Processing System

Payment Processing System

Distributed Job Scheduler

WhatsApp / Messenger

Payment Wallet at Global Scale

Uber / Ride Sharing

Web Search Engine

Globally Distributed SQL Database

Real-Time Analytics System

Recommendation System (Netflix)

Multi-Region Disaster Recovery System

Time-Series Database

Fraud Detection System

Design Uber / Ride Sharing

Summary

Key Takeaways

Core Problem

The Hard Part

Scaling Axis

Critical Invariant

Performance Requirement

Key Tradeoff

Design Walkthrough

Problem Statement

What to say first

Clarifying Questions

Question 1: How big is this?

Question 2: How fast should matching be?

Question 3: What car types do we need?

Question 4: How do drivers get paid?

Summarize your assumptions

The Hard Part

Say this to the interviewer

Common mistake candidates make