System Design Masterclass

58 items

System Design Masterclass

E-Commercebookingreservationinventoryavailabilitydouble-bookingintermediate

Design Hotel Booking and Reservation System

Design a hotel booking system like Booking.com or Airbnb

Millions of bookings/day, 100M+ listings|Similar to Airbnb, Booking.com, Expedia, Marriott, Hilton|45 min read

Summary

A hotel booking system handles millions of searches, availability checks, and reservations daily while preventing double-bookings. The core challenges are maintaining accurate real-time inventory across distributed systems, handling high read-to-write ratios with strong consistency for bookings, and optimizing search across millions of properties with complex filtering. This is a classic interview question at Airbnb, Booking.com, Expedia, and any travel tech company.

Key Takeaways

Core Problem

This is fundamentally an inventory management problem where the same room cannot be sold twice for the same dates - requiring strong consistency for booking operations.

The Hard Part

Preventing double-bookings in a distributed system while maintaining low latency for the millions of availability searches that do not result in bookings.

Scaling Axis

Scale reads (search/availability) and writes (bookings) separately. Reads can be eventually consistent; writes must be strongly consistent per property.

The Question: Design a hotel booking system that handles millions of daily searches and bookings while ensuring no room is ever double-booked.

A hotel booking system must support: - Property Search: Find available hotels by location, dates, price, amenities - Availability Check: Real-time room availability for specific dates - Booking Flow: Reserve rooms with payment processing - Inventory Management: Hotels update room availability and pricing - Booking Management: Users view, modify, or cancel reservations

What to say first

Before I design, let me clarify the requirements. The key challenge here is preventing double-bookings in a distributed system while maintaining fast search performance. I will focus on the booking flow and inventory management.

Hidden requirements interviewers test: - Do you understand the double-booking problem? - Can you handle the read-heavy workload efficiently? - Do you know when to use strong vs eventual consistency? - Can you design for both search optimization and booking correctness?

Summary

Key Takeaways

Core Problem

This is fundamentally an inventory management problem where the same room cannot be sold twice for the same dates - requiring strong consistency for booking operations.

The Hard Part

Preventing double-bookings in a distributed system while maintaining low latency for the millions of availability searches that do not result in bookings.

Scaling Axis

Scale reads (search/availability) and writes (bookings) separately. Reads can be eventually consistent; writes must be strongly consistent per property.

Critical Invariant

A room must never be double-booked. If two users try to book the same room for overlapping dates, exactly one must succeed and one must fail.

Performance Insight

Read-to-write ratio is 100:1 or higher. Optimize aggressively for reads (caching, denormalization) while protecting write consistency.

Key Tradeoff

We accept eventual consistency for search results (showing slightly stale availability) to achieve low latency, but use strong consistency for actual bookings.

Design Walkthrough

Problem Statement

The Question: Design a hotel booking system that handles millions of daily searches and bookings while ensuring no room is ever double-booked.

What to say first

Clarifying Questions

Ask these questions to shape your architecture. Each answer has significant design implications.

Question 1: Scale

What is the expected scale? How many properties, searches per second, and bookings per day?

Why this matters: Determines sharding strategy and caching needs. Typical answer: 1M+ properties, 10K searches/sec, 1M bookings/day Architecture impact: Need distributed search, aggressive caching, sharded booking database

Question 2: Consistency requirements

Is it acceptable to show slightly stale availability in search results? What about during the actual booking flow?

Why this matters: Determines consistency model. Typical answer: Stale search OK (within minutes), booking must be strongly consistent Architecture impact: Eventual consistency for reads, strong consistency for writes

Question 3: Booking hold time

Should we hold inventory while user completes payment? For how long?

Why this matters: Determines locking strategy. Typical answer: 10-15 minute hold during checkout Architecture impact: Need TTL-based locks or reservation state machine

Question 4: Pricing complexity

Is pricing static or dynamic? Does it vary by date, demand, or user segment?

Why this matters: Dynamic pricing adds complexity. Typical answer: Dynamic by date and demand, possibly personalized Architecture impact: Need pricing service, price caching strategy

Stating assumptions

I will assume: 1M properties, 10K searches/sec, 1M bookings/day, eventual consistency OK for search, strong consistency required for booking, 15-minute hold during checkout.

The Hard Part

Say this out loud

The hard part here is preventing double-bookings in a distributed system. Two users searching simultaneously might both see a room as available, but only one booking can succeed.

Why this is genuinely hard:

1.Race Conditions: User A and User B both see Room 101 available for Dec 25. Both click book. Both reach payment. Who gets the room?
2.Distributed State: Inventory data may be cached across multiple servers. How do you ensure all servers agree on availability?
3.Date Range Overlaps: Booking Dec 20-25 conflicts with Dec 23-28. Detecting overlaps efficiently across millions of existing bookings is non-trivial.
4.Hold and Release: User starts checkout but abandons. The held inventory must be released, but what if they come back?

Common mistake

Candidates often propose checking availability then booking as two separate steps without locking. This creates a race condition window where double-booking can occur.

The fundamental tension:

High Availability for Search -> Cache availability -> Stale data possible
Strong Consistency for Booking -> Lock inventory -> Potential contention

The solution is to separate the read path (search) from the write path (booking): - Reads: Eventually consistent, heavily cached, optimized for speed - Writes: Strongly consistent, serialized per property, optimized for correctness

Double Booking Race Condition

Scale and Access Patterns

Let me estimate the scale and understand access patterns to drive design decisions.

Dimension	Value	Impact
Properties	1,000,000	Need efficient search indexing
Rooms per property	50 average	50M total room-nights to manage

+ 5 more rows...

What to say

At 10K searches/sec but only 12 bookings/sec, this is extremely read-heavy. We should optimize aggressively for search performance while ensuring booking correctness.

Storage estimation:
- 1M properties x 10KB metadata = 10 GB
- 50M rooms x 1KB = 50 GB

+ 10 more lines...

Access Pattern Analysis:

Search: Geo-based, date-filtered, sorted by price/rating. Highly cacheable. - Availability: Property + date range lookup. Moderate cacheability (changes on booking). - Booking: Write operation, must be serialized per room-date. - Hot spots: Popular destinations, holiday dates, last-minute bookings.

High-Level Architecture

Let me design an architecture that separates read and write paths for optimal performance and correctness.

What to say

I will separate the read path (search, availability) from the write path (booking). Reads can be eventually consistent and heavily cached. Writes must be strongly consistent and serialized per property.

Hotel Booking System Architecture

Component Responsibilities:

1.Search Service: Handles property discovery queries - Uses Elasticsearch for full-text and geo search - Heavily cached, eventually consistent - Filters by location, dates, price, amenities
2.Availability Service: Real-time room availability - Reads from cache first, falls back to replica - Invalidated on booking events - Returns available rooms and pricing
3.Booking Service: Handles reservation creation - Acquires distributed lock per room-date - Validates availability in primary DB - Creates booking atomically - Publishes booking event
4.Payment Service: Processes payments - Called after inventory hold - On failure, releases hold - Supports refunds for cancellations

Real-world reference

Airbnb uses a similar architecture with separate search (Elasticsearch) and booking (MySQL) paths. They use Redis for availability caching and distributed locking.

Data Model and Storage

The data model must support efficient querying for both search and booking while preventing double-bookings.

What to say

The key insight is that availability is the intersection of what the hotel offers and what is not already booked. I will model this with separate tables for inventory and bookings.

-- Properties (hotels, vacation rentals)
CREATE TABLE properties (
    id UUID PRIMARY KEY,

+ 60 more lines...

Availability Calculation:

The key query is: "How many rooms of type X are available on dates Y to Z?"

-- Check availability for a room type and date range
WITH date_range AS (
    SELECT generate_series(

+ 24 more lines...

Important optimization

The availability query above is expensive for real-time use. In production, we pre-compute and cache available_count in the room_pricing table, updating it asynchronously when bookings change.

Booking Flow Deep Dive

The booking flow is the critical path that must prevent double-bookings. Let me walk through it step by step.

Booking Flow State Machine

Step-by-step booking with locking:

class BookingService:
    def __init__(self, db, redis, payment_service):
        self.db = db

+ 71 more lines...

Critical: Always read from primary

During the booking flow, always read availability from the PRIMARY database, not replicas or cache. Stale data here leads to double-bookings.

Alternative: Optimistic Locking

Instead of distributed locks, use database-level optimistic locking:

-- Add version column to room_pricing
ALTER TABLE room_pricing ADD COLUMN version INT DEFAULT 0;

+ 22 more lines...

Search and Availability Optimization

Search is 100x more frequent than booking. We must optimize aggressively without sacrificing booking correctness.

What to say

I will use Elasticsearch for property search and Redis for availability caching. The cache is eventually consistent but fast. Actual booking always validates against the primary database.

{
  "mappings": {
    "properties": {

+ 16 more lines...

{
  "query": {
    "bool": {

+ 21 more lines...

Availability Caching Strategy:

class AvailabilityCache:
    def __init__(self, redis, db):
        self.redis = redis

+ 33 more lines...

Cache invalidation strategy

On booking confirmation, publish event to message queue. Availability service subscribes and invalidates relevant cache entries. This decouples booking from cache updates.

Consistency and Invariants

System Invariants

The number of confirmed + pending bookings for a room type on any given date must never exceed the total inventory. Violating this means double-booking.

Consistency Model by Operation:

Operation	Consistency	Why
Property search	Eventually consistent	Stale results OK, user will recheck
Availability display	Eventually consistent	Shows approximate, validates on book
Booking creation	Strongly consistent	Must prevent double-booking
Booking confirmation	Strongly consistent	Payment tied to inventory
Cancellation	Strongly consistent	Must release inventory correctly

What to say

I accept that search results may be slightly stale - a user might click on a property that just got fully booked. But the actual booking flow validates against the primary database, so we never actually double-book.

Handling Conflicts:

When two users try to book the last room simultaneously:

Time    User A                  User B                  System
----    ------                  ------                  ------
T1      Click Book              Click Book              Both see available

+ 9 more lines...

Database-Level Protection:

As a final safety net, add a database constraint:

-- Trigger to prevent overbooking
CREATE OR REPLACE FUNCTION check_overbooking()
RETURNS TRIGGER AS $$

+ 29 more lines...

Failure Modes and Resilience

Proactively discuss failures

Let me walk through failure scenarios. This is critical for a booking system where failures can mean lost revenue or angry customers.

Failure	Impact	Mitigation	Why It Works
Primary DB down	Cannot create bookings	Failover to standby, reject new bookings until healthy	Preserve consistency over availability
Redis (cache) down	Slow availability checks	Fall back to DB replicas, increased latency	Graceful degradation

+ 4 more rows...

Handling Payment Failures:

async def confirm_booking_with_retry(self, booking_id: str, 
                                      payment_token: str,
                                      max_retries: int = 3):

+ 35 more lines...

Idempotency is critical

Always use idempotency keys for payment operations. Network failures might cause retries, and we must not charge the customer twice for the same booking.

Expired Booking Cleanup:

# Run every minute
async def cleanup_expired_bookings():
    # Find pending bookings older than 15 minutes

+ 15 more lines...

Evolution and Scaling

What to say

This design handles up to 1M bookings/day with a single primary database. Let me discuss how it evolves for 10x scale and additional features.

Evolution Path:

Stage 1: Single Region (current design) - Single PostgreSQL primary + replicas - Redis cluster for caching and locks - Elasticsearch for search - Handles ~1M bookings/day

Stage 2: Sharded by Property (10x scale) - Shard bookings table by property_id - Each shard handles bookings for subset of properties - No cross-shard transactions needed (bookings are per-property)

Stage 3: Multi-Region (global scale) - Regional deployments with local databases - Properties assigned to regions - Cross-region booking requires redirect to owning region

Sharded Architecture

Additional Features to Consider:

1.Dynamic Pricing: Add pricing service that adjusts based on demand, seasonality, competitor prices
2.Waitlist: When fully booked, allow users to join waitlist. Notify on cancellation.
3.Overbooking Strategy: Hotels sometimes overbook by 5-10% assuming some cancellations. Need policy for handling when everyone shows up.
4.Multi-room Bookings: Booking 3 rooms simultaneously. Need atomic reservation across rooms.
5.Channel Management: Sync inventory across Booking.com, Expedia, direct website. Real-time inventory sync critical.

Alternative approach

If we needed sub-second booking globally, I would consider event sourcing: every booking is an event, current availability is derived from event log. This enables geographic distribution with eventual convergence.

Scale	Architecture	Tradeoff
<1M bookings/day	Single primary DB	Simple, strongly consistent
1-10M bookings/day	Sharded by property	More complex routing, still consistent per shard
10M+ bookings/day	Multi-region with property affinity	Latency for cross-region, complex sync
Global real-time	Event sourcing + CRDT	Eventually consistent, complex implementation

Design Trade-offs

Advantages

+Prevents race conditions
+Simple mental model
+Guaranteed consistency

Disadvantages

-Lock contention on popular rooms
-Lock timeout complexity
-Single point of failure if lock service down

When to use

Default choice for booking systems. Works well up to moderate scale.

System Design Masterclass

Weather Application with Forecasting

URL Shortener

Live Comments Feature

API Rate Limiter

On-Call Escalation System

Hotel Booking and Reservation System

Parts Compatibility Validation

Real-time Stock Price Viewer

Top-K Rankings System

File Download and Sync Library

Real-time Active Viewers

Marketplace Features

Price Alert System

Netflix Screen Concurrency Limits

Live Reactions System

Top K Most Shared Articles

High-Profile Likes Counter

Authentication and User Login

Google Calendar

Web Crawler

News Feed

Video Streaming Platform

IoC / Dependency Injection Framework

Distributed Control Infrastructure

Notification Service

Distributed Tracing System

P2P File Transfer System

Large Data Migration to Cloud

Wire Transfer API

Large Data Sorting and Processing

Database Control Plane

Distributed Metrics Logging and Aggregation

Ads Management & Delivery System

Flash Sale Backend

Photo Sharing Platform

Cluster Health Monitoring System

Rider Matching System

Surge Pricing System

Collaborative Editing System

Server Metrics Collection System

User Analytics Dashboard & Event Pipeline

Dropbox / Google Drive

Distributed Message Queue

ETA and Live Location Sharing

Distributed Key-Value Store

Distributed Stream Processing System

Payment Processing System

Distributed Job Scheduler

WhatsApp / Messenger

Payment Wallet at Global Scale

Uber / Ride Sharing

Web Search Engine

Globally Distributed SQL Database

Real-Time Analytics System

Recommendation System (Netflix)

Multi-Region Disaster Recovery System

Time-Series Database

Fraud Detection System

Design Hotel Booking and Reservation System

Summary

Key Takeaways

Core Problem

The Hard Part

Scaling Axis

Problem Statement

What to say first

Premium Content