System Design Masterclass

58 items

System Design Masterclass

Social Mediaphoto-storageimage-processingsearchml-inferencecdnadvanced

Design Photo Sharing Platform

Design a photo sharing platform like Google Photos with billions of photos

Billions of photos, petabytes of storage|Similar to Google, Apple, Amazon, Meta, Flickr|45 min read

Summary

Google Photos stores billions of photos with intelligent search, automatic organization, and sharing features. The core challenges are efficient storage at massive scale, image processing pipelines, ML-powered search and organization, and fast retrieval. This tests blob storage, CDN architecture, ML systems, and metadata management.

Key Takeaways

Core Problem

This is a large-scale blob storage system with heavy read patterns, requiring efficient storage, fast retrieval, and intelligent organization through ML.

The Hard Part

Balancing storage costs (photos are large) with retrieval speed (users expect instant loading) while running ML pipelines on every upload.

Scaling Axis

Scale by user ID for metadata, by content hash for deduplication, and by geography for CDN distribution.

The Question: Design a photo storage and sharing platform like Google Photos that can handle billions of photos with intelligent search and organization.

Core features to support: - Upload: Store photos from mobile/web with automatic backup - View: Fast retrieval of thumbnails and full-resolution images - Search: Find photos by people, places, objects, dates - Organize: Automatic albums, memories, and suggestions - Share: Albums, links, collaborative spaces

What to say first

Before designing, let me clarify the scale and key requirements. I want to understand upload volume, storage constraints, and which features are most critical for the MVP.

Hidden requirements interviewers test: - Do you understand blob storage vs metadata storage? - Can you design efficient image processing pipelines? - Do you know how to integrate ML for search/organization? - Can you optimize for both storage cost and retrieval speed?

Summary

Key Takeaways

Core Problem

This is a large-scale blob storage system with heavy read patterns, requiring efficient storage, fast retrieval, and intelligent organization through ML.

The Hard Part

Balancing storage costs (photos are large) with retrieval speed (users expect instant loading) while running ML pipelines on every upload.

Scaling Axis

Scale by user ID for metadata, by content hash for deduplication, and by geography for CDN distribution.

Critical Invariant

Never lose a photo. Users trust us with irreplaceable memories. Durability is non-negotiable (11 nines).

Performance Requirement

Photo thumbnails must load in under 200ms. Full resolution within 1 second. Search results in under 500ms.

Key Tradeoff

We trade storage cost for retrieval speed by generating multiple resolutions upfront, and trade upload latency for search quality by running ML asynchronously.

Design Walkthrough

Problem Statement

The Question: Design a photo storage and sharing platform like Google Photos that can handle billions of photos with intelligent search and organization.

What to say first

Before designing, let me clarify the scale and key requirements. I want to understand upload volume, storage constraints, and which features are most critical for the MVP.

Clarifying Questions

Ask questions that shape your architecture. Each answer changes your design.

Question 1: Scale

How many users and photos? What is the upload rate and storage growth?

Why this matters: Determines storage tier strategy and processing capacity. Typical answer: 1B users, 4B photos uploaded daily, 100PB total storage Architecture impact: Need object storage, CDN, async processing pipelines

Question 2: Photo Types

What formats and sizes? Do we support videos? RAW photos?

Why this matters: Affects processing pipeline and storage requirements. Typical answer: JPEG/PNG/HEIC, average 3MB, some videos up to 1GB Architecture impact: Need transcoding, multiple resolution generation

Question 3: Search Requirements

What search capabilities? Face recognition? Object detection? Text in images?

Why this matters: Determines ML pipeline complexity. Typical answer: Search by faces, places, objects, dates - all automatic Architecture impact: Need ML inference pipeline, feature extraction, vector search

Question 4: Access Patterns

How often are photos accessed after upload? Recent vs old photos?

Why this matters: Determines hot/cold storage strategy. Typical answer: 80% of views are photos from last 30 days Architecture impact: Tiered storage - hot (SSD/CDN), warm (HDD), cold (archive)

Stating assumptions

I will assume: 1B users, 4B daily uploads averaging 3MB each (12PB/day new storage), search by faces/places/objects, 80-20 access pattern favoring recent photos.

The Hard Part

Say this out loud

The hard part here is balancing storage costs at petabyte scale with fast retrieval while running ML pipelines on billions of photos for intelligent search.

Why this is genuinely hard:

1.Storage Cost: At 100PB, every optimization matters. Storing redundant copies, multiple resolutions, and ML features adds up quickly.
2.Processing Scale: 4B uploads/day means 46K photos/second. Each needs: deduplication check, thumbnail generation, ML feature extraction.
3.Retrieval Speed: Users expect instant thumbnail loading. Cannot afford to fetch from cold storage for common operations.
4.ML at Scale: Running face detection, object recognition, OCR on every photo is computationally expensive.
5.Durability vs Cost: 11 nines durability requires replication, but replication multiplies storage costs.

Common mistake

Candidates often focus only on the upload/download path and forget about the ML pipeline, search indexing, and storage tier management. These are equally important.

The fundamental tradeoffs:

Storage Cost vs Retrieval Speed (pre-generate thumbnails or generate on demand?) - Upload Latency vs Search Quality (sync ML or async?) - Durability vs Cost (how many replicas?) - Accuracy vs Speed (ML model size vs inference time)

Scale and Access Patterns

Let me estimate the scale and understand access patterns.

Dimension	Value	Impact
Total Users	1 Billion	Massive metadata scale, need sharding
Daily Uploads	4 Billion photos	46K uploads/second, need async processing

+ 4 more rows...

What to say

This is a read-heavy system with 10:1 read-to-write ratio. Most reads are for thumbnails of recent photos. Storage is dominated by original photos, but metadata and ML features add significant overhead.

Storage Breakdown per Photo:

Original photo:     3 MB (100%)
Large thumbnail:    200 KB (for gallery view)
Medium thumbnail:   50 KB (for grid view)

+ 11 more lines...

Access Pattern Analysis:

Hot data (0-30 days): 80% of reads, keep on SSD/CDN - Warm data (30-365 days): 15% of reads, HDD storage - Cold data (1+ years): 5% of reads, archive storage - Thumbnail vs Original: 95% of views are thumbnails only

High-Level Architecture

Let me design the system in layers: upload, processing, storage, and retrieval.

What to say

I will separate the write path (upload and processing) from the read path (viewing and search). The write path is async and optimized for throughput. The read path is sync and optimized for latency.

Photo Platform Architecture

Component Responsibilities:

1.CDN: Serves thumbnails globally, caches hot content at edge
2.Upload Service: Receives photos, validates, stores original, queues for processing
3.Processing Pipeline: - Thumbnail Generator: Creates multiple resolutions - ML Pipeline: Extracts faces, objects, text, location - Deduplication: Identifies duplicate uploads
4.Object Storage: Stores original photos and thumbnails (S3/GCS)
5.Metadata DB: User info, photo metadata, albums, permissions
6.Search Index: ML features, labels, face embeddings for search

Real-world reference

Google Photos uses Colossus (distributed file system) for storage, Bigtable for metadata, and custom ML pipelines running on TPUs. The architecture separates hot thumbnails from cold originals.

Data Model and Storage

We need to design storage for: metadata (structured), photos (blobs), and ML features (vectors).

What to say

I will use different storage systems optimized for each data type: relational DB for metadata, object storage for photos, and vector DB for ML features.

-- Sharded by user_id
CREATE TABLE photos (
    photo_id        UUID PRIMARY KEY,

+ 41 more lines...

Object Storage Structure:

Bucket: photos-originals
  /{user_id_prefix}/{user_id}/{photo_id}/original.{ext}

+ 12 more lines...

ML Features Storage:

-- Face embeddings for facial recognition
CREATE TABLE face_embeddings (
    face_id         UUID PRIMARY KEY,

+ 33 more lines...

Important detail

Face embeddings require vector similarity search. Use a specialized vector database (Pinecone, Milvus) or PostgreSQL with pgvector for smaller scale. At Google scale, they use custom systems.

Upload and Processing Pipeline

The upload flow must be reliable and handle failures gracefully. Processing happens asynchronously.

Upload and Processing Flow

async def upload_photo(user_id: str, file: UploadFile) -> PhotoResponse:
    # 1. Validate file
    if not is_valid_image(file):

+ 39 more lines...

async def process_photo(job: ProcessingJob):
    photo_id = job.photo_id
    user_id = job.user_id

+ 53 more lines...

Optimization

For efficiency, batch multiple photos together for ML inference. GPUs are more efficient processing batches of 32-64 images than single images.

Search and ML Pipeline

Search is what makes Google Photos powerful. Users can search by faces, places, objects, and text without manual tagging.

What to say

The search system combines structured queries (date, location) with ML-powered semantic search (faces, objects). Different query types hit different indexes.

Search Query Types:

Query Type	Example	Index Used	Complexity
Date range	Photos from 2023	Metadata DB (taken_at index)	Simple range query
Location	Photos in Paris	Metadata DB (geo index) + reverse geocoding	Geo query

+ 4 more rows...

async def search_photos(user_id: str, query: str, limit: int = 50) -> List[Photo]:
    # 1. Parse query to identify search type
    parsed = parse_search_query(query)

+ 58 more lines...

Face Recognition Pipeline:

Face Recognition Flow

async def cluster_face(user_id: str, photo_id: str, face_embedding: List[float]):
    """
    Assign a detected face to an existing person or create new person cluster.

+ 30 more lines...

Privacy consideration

Face recognition data is highly sensitive. Always scope face embeddings to individual users - never cross-user matching. Provide users controls to disable face recognition entirely.

Consistency and Invariants

System Invariants

1. Never lose a photo - 11 nines durability (99.999999999%) 2. Always maintain user ownership - photo access only by owner or explicit share 3. Original quality preserved - never modify original, only create derivatives

Durability Strategy:

For irreplaceable user data, durability is paramount. We achieve 11 nines through:

Layer	Strategy	Durability Contribution
Object Storage	S3/GCS with cross-region replication	11 nines built-in
Metadata DB	Synchronous replication, daily backups	Point-in-time recovery
Upload Verification	Checksum validation on upload	Prevents silent corruption
Background Verification	Periodic integrity checks	Detects bit rot

async def upload_with_durability(user_id: str, file: bytes, photo_id: str):
    # 1. Calculate checksum before upload
    expected_checksum = hashlib.md5(file).hexdigest()

+ 22 more lines...

Consistency Model:

Strong consistency for ownership: Who owns a photo is always consistent - Eventual consistency for search: ML features may take minutes to index - Eventual consistency for thumbnails: May see original before thumbnails ready

Business impact mapping

Losing a photo is unrecoverable and destroys user trust. Search being 30 seconds delayed is invisible to users. We optimize consistency guarantees based on business impact.

Failure Modes and Resilience

Proactively discuss failures

Let me walk through failure scenarios. The key principle is: never lose data, gracefully degrade features.

Failure	Impact	Mitigation	User Experience
Object storage down	Cannot upload or view	Multi-region replication, failover	Serve from secondary region
ML pipeline backlog	Search features delayed	Async processing, prioritize recent	Photos visible, search catches up

+ 4 more rows...

Graceful Degradation Strategy:

async def get_photo_thumbnail(photo_id: str, size: str) -> bytes:
    photo = await db.get_photo(photo_id)

+ 27 more lines...

Data Recovery Procedures:

async def verify_photo_integrity(photo_id: str) -> IntegrityResult:
    """Run periodically on random sample of photos."""
    photo = await db.get_photo(photo_id)

+ 24 more lines...

What to say

The system is designed so that data loss is nearly impossible, while feature degradation is handled gracefully. Users might see slower load times or delayed search, but they will never lose photos.

Evolution and Scaling

What to say

This design handles billions of photos. Let me discuss how it evolves for even larger scale and advanced features.

Scaling Evolution:

Stage 1: Single Region (up to 10M users) - Single object storage bucket - Single metadata database cluster - ML processing on shared GPU cluster

Stage 2: Multi-Region (up to 1B users) - Regional object storage with cross-region replication - Sharded metadata database by user_id - Regional ML processing, centralized model serving

Stage 3: Global (1B+ users) - Edge caching for thumbnails worldwide - Active-active metadata in each region - Distributed ML inference at edge for latency-sensitive features

Scale	Bottleneck	Solution
10M photos	Single DB	Read replicas
1B photos	Storage costs	Tiered storage, deduplication
100B photos	ML processing	Batched inference, edge ML
1T photos	Search latency	Sharded search, approximate NN

Cost Optimization at Scale:

Photo Lifecycle:

Day 0-30:    Hot Storage (SSD-backed, CDN cached)

+ 14 more lines...

Advanced Features Evolution:

Feature	Complexity	Architecture Addition
Memories/Highlights	Medium	Background job analyzes patterns, creates collections
Shared Albums	Medium	Access control layer, cross-user queries
Print Products	Low	Integration with print partners API
Video Support	High	Transcoding pipeline, streaming infrastructure
Live Photos	High	Combined video+image storage, playback sync

Alternative approach

If storage cost were the primary constraint (not Google scale), I would use more aggressive deduplication including cross-user dedup for public/stock photos, and offer quality tiers where users can choose storage vs quality tradeoff.

What I would do differently for...

Privacy-focused (like iCloud): Client-side encryption, zero-knowledge architecture. ML features run on-device only.

Social-first (like Instagram): Optimize for feed generation, focus on public sharing, real-time notifications.

Enterprise (like Dropbox): Emphasize sharing controls, audit logs, compliance features, team management.

Design Trade-offs

Advantages

+Fast retrieval
+Predictable latency
+CDN-friendly

Disadvantages

-Higher storage cost
-Processing delay on upload

When to use

Read-heavy systems where users expect instant loading

System Design Masterclass

Weather Application with Forecasting

URL Shortener

Live Comments Feature

API Rate Limiter

On-Call Escalation System

Hotel Booking and Reservation System

Parts Compatibility Validation

Real-time Stock Price Viewer

Top-K Rankings System

File Download and Sync Library

Real-time Active Viewers

Marketplace Features

Price Alert System

Netflix Screen Concurrency Limits

Live Reactions System

Top K Most Shared Articles

High-Profile Likes Counter

Authentication and User Login

Google Calendar

Web Crawler

News Feed

Video Streaming Platform

IoC / Dependency Injection Framework

Distributed Control Infrastructure

Notification Service

Distributed Tracing System

P2P File Transfer System

Large Data Migration to Cloud

Wire Transfer API

Large Data Sorting and Processing

Database Control Plane

Distributed Metrics Logging and Aggregation

Ads Management & Delivery System

Flash Sale Backend

Photo Sharing Platform

Cluster Health Monitoring System

Rider Matching System

Surge Pricing System

Collaborative Editing System

Server Metrics Collection System

User Analytics Dashboard & Event Pipeline

Dropbox / Google Drive

Distributed Message Queue

ETA and Live Location Sharing

Distributed Key-Value Store

Distributed Stream Processing System

Payment Processing System

Distributed Job Scheduler

WhatsApp / Messenger

Payment Wallet at Global Scale

Uber / Ride Sharing

Web Search Engine

Globally Distributed SQL Database

Real-Time Analytics System

Recommendation System (Netflix)

Multi-Region Disaster Recovery System

Time-Series Database

Fraud Detection System

Design Photo Sharing Platform

Summary

Key Takeaways

Core Problem

The Hard Part

Scaling Axis

Problem Statement

What to say first

Premium Content