Design Walkthrough
Problem Statement
The Question: Design a photo storage and sharing platform like Google Photos that can handle billions of photos with intelligent search and organization.
Core features to support: - Upload: Store photos from mobile/web with automatic backup - View: Fast retrieval of thumbnails and full-resolution images - Search: Find photos by people, places, objects, dates - Organize: Automatic albums, memories, and suggestions - Share: Albums, links, collaborative spaces
What to say first
Before designing, let me clarify the scale and key requirements. I want to understand upload volume, storage constraints, and which features are most critical for the MVP.
Hidden requirements interviewers test: - Do you understand blob storage vs metadata storage? - Can you design efficient image processing pipelines? - Do you know how to integrate ML for search/organization? - Can you optimize for both storage cost and retrieval speed?
Clarifying Questions
Ask questions that shape your architecture. Each answer changes your design.
Question 1: Scale
How many users and photos? What is the upload rate and storage growth?
Why this matters: Determines storage tier strategy and processing capacity. Typical answer: 1B users, 4B photos uploaded daily, 100PB total storage Architecture impact: Need object storage, CDN, async processing pipelines
Question 2: Photo Types
What formats and sizes? Do we support videos? RAW photos?
Why this matters: Affects processing pipeline and storage requirements. Typical answer: JPEG/PNG/HEIC, average 3MB, some videos up to 1GB Architecture impact: Need transcoding, multiple resolution generation
Question 3: Search Requirements
What search capabilities? Face recognition? Object detection? Text in images?
Why this matters: Determines ML pipeline complexity. Typical answer: Search by faces, places, objects, dates - all automatic Architecture impact: Need ML inference pipeline, feature extraction, vector search
Question 4: Access Patterns
How often are photos accessed after upload? Recent vs old photos?
Why this matters: Determines hot/cold storage strategy. Typical answer: 80% of views are photos from last 30 days Architecture impact: Tiered storage - hot (SSD/CDN), warm (HDD), cold (archive)
Stating assumptions
I will assume: 1B users, 4B daily uploads averaging 3MB each (12PB/day new storage), search by faces/places/objects, 80-20 access pattern favoring recent photos.
The Hard Part
Say this out loud
The hard part here is balancing storage costs at petabyte scale with fast retrieval while running ML pipelines on billions of photos for intelligent search.
Why this is genuinely hard:
- 1.Storage Cost: At 100PB, every optimization matters. Storing redundant copies, multiple resolutions, and ML features adds up quickly.
- 2.Processing Scale: 4B uploads/day means 46K photos/second. Each needs: deduplication check, thumbnail generation, ML feature extraction.
- 3.Retrieval Speed: Users expect instant thumbnail loading. Cannot afford to fetch from cold storage for common operations.
- 4.ML at Scale: Running face detection, object recognition, OCR on every photo is computationally expensive.
- 5.Durability vs Cost: 11 nines durability requires replication, but replication multiplies storage costs.
Common mistake
Candidates often focus only on the upload/download path and forget about the ML pipeline, search indexing, and storage tier management. These are equally important.
The fundamental tradeoffs:
- Storage Cost vs Retrieval Speed (pre-generate thumbnails or generate on demand?) - Upload Latency vs Search Quality (sync ML or async?) - Durability vs Cost (how many replicas?) - Accuracy vs Speed (ML model size vs inference time)
Scale and Access Patterns
Let me estimate the scale and understand access patterns.
| Dimension | Value | Impact |
|---|---|---|
| Total Users | 1 Billion | Massive metadata scale, need sharding |
| Daily Uploads | 4 Billion photos | 46K uploads/second, need async processing |
What to say
This is a read-heavy system with 10:1 read-to-write ratio. Most reads are for thumbnails of recent photos. Storage is dominated by original photos, but metadata and ML features add significant overhead.
Storage Breakdown per Photo:
Original photo: 3 MB (100%)
Large thumbnail: 200 KB (for gallery view)
Medium thumbnail: 50 KB (for grid view) Access Pattern Analysis:
- Hot data (0-30 days): 80% of reads, keep on SSD/CDN - Warm data (30-365 days): 15% of reads, HDD storage - Cold data (1+ years): 5% of reads, archive storage - Thumbnail vs Original: 95% of views are thumbnails only
High-Level Architecture
Let me design the system in layers: upload, processing, storage, and retrieval.
What to say
I will separate the write path (upload and processing) from the read path (viewing and search). The write path is async and optimized for throughput. The read path is sync and optimized for latency.
Photo Platform Architecture
Component Responsibilities:
- 1.CDN: Serves thumbnails globally, caches hot content at edge
- 2.Upload Service: Receives photos, validates, stores original, queues for processing
- 3.Processing Pipeline: - Thumbnail Generator: Creates multiple resolutions - ML Pipeline: Extracts faces, objects, text, location - Deduplication: Identifies duplicate uploads
- 4.Object Storage: Stores original photos and thumbnails (S3/GCS)
- 5.Metadata DB: User info, photo metadata, albums, permissions
- 6.Search Index: ML features, labels, face embeddings for search
Real-world reference
Google Photos uses Colossus (distributed file system) for storage, Bigtable for metadata, and custom ML pipelines running on TPUs. The architecture separates hot thumbnails from cold originals.
Data Model and Storage
We need to design storage for: metadata (structured), photos (blobs), and ML features (vectors).
What to say
I will use different storage systems optimized for each data type: relational DB for metadata, object storage for photos, and vector DB for ML features.
-- Sharded by user_id
CREATE TABLE photos (
photo_id UUID PRIMARY KEY,Object Storage Structure:
Bucket: photos-originals
/{user_id_prefix}/{user_id}/{photo_id}/original.{ext}
ML Features Storage:
-- Face embeddings for facial recognition
CREATE TABLE face_embeddings (
face_id UUID PRIMARY KEY,Important detail
Face embeddings require vector similarity search. Use a specialized vector database (Pinecone, Milvus) or PostgreSQL with pgvector for smaller scale. At Google scale, they use custom systems.
Upload and Processing Pipeline
The upload flow must be reliable and handle failures gracefully. Processing happens asynchronously.
Upload and Processing Flow
async def upload_photo(user_id: str, file: UploadFile) -> PhotoResponse:
# 1. Validate file
if not is_valid_image(file):async def process_photo(job: ProcessingJob):
photo_id = job.photo_id
user_id = job.user_idOptimization
For efficiency, batch multiple photos together for ML inference. GPUs are more efficient processing batches of 32-64 images than single images.
Search and ML Pipeline
Search is what makes Google Photos powerful. Users can search by faces, places, objects, and text without manual tagging.
What to say
The search system combines structured queries (date, location) with ML-powered semantic search (faces, objects). Different query types hit different indexes.
Search Query Types:
| Query Type | Example | Index Used | Complexity |
|---|---|---|---|
| Date range | Photos from 2023 | Metadata DB (taken_at index) | Simple range query |
| Location | Photos in Paris | Metadata DB (geo index) + reverse geocoding | Geo query |
async def search_photos(user_id: str, query: str, limit: int = 50) -> List[Photo]:
# 1. Parse query to identify search type
parsed = parse_search_query(query)Face Recognition Pipeline:
Face Recognition Flow
async def cluster_face(user_id: str, photo_id: str, face_embedding: List[float]):
"""
Assign a detected face to an existing person or create new person cluster.Privacy consideration
Face recognition data is highly sensitive. Always scope face embeddings to individual users - never cross-user matching. Provide users controls to disable face recognition entirely.
Consistency and Invariants
System Invariants
1. Never lose a photo - 11 nines durability (99.999999999%) 2. Always maintain user ownership - photo access only by owner or explicit share 3. Original quality preserved - never modify original, only create derivatives
Durability Strategy:
For irreplaceable user data, durability is paramount. We achieve 11 nines through:
| Layer | Strategy | Durability Contribution |
|---|---|---|
| Object Storage | S3/GCS with cross-region replication | 11 nines built-in |
| Metadata DB | Synchronous replication, daily backups | Point-in-time recovery |
| Upload Verification | Checksum validation on upload | Prevents silent corruption |
| Background Verification | Periodic integrity checks | Detects bit rot |
async def upload_with_durability(user_id: str, file: bytes, photo_id: str):
# 1. Calculate checksum before upload
expected_checksum = hashlib.md5(file).hexdigest()Consistency Model:
- Strong consistency for ownership: Who owns a photo is always consistent - Eventual consistency for search: ML features may take minutes to index - Eventual consistency for thumbnails: May see original before thumbnails ready
Business impact mapping
Losing a photo is unrecoverable and destroys user trust. Search being 30 seconds delayed is invisible to users. We optimize consistency guarantees based on business impact.
Failure Modes and Resilience
Proactively discuss failures
Let me walk through failure scenarios. The key principle is: never lose data, gracefully degrade features.
| Failure | Impact | Mitigation | User Experience |
|---|---|---|---|
| Object storage down | Cannot upload or view | Multi-region replication, failover | Serve from secondary region |
| ML pipeline backlog | Search features delayed | Async processing, prioritize recent | Photos visible, search catches up |
Graceful Degradation Strategy:
async def get_photo_thumbnail(photo_id: str, size: str) -> bytes:
photo = await db.get_photo(photo_id)
Data Recovery Procedures:
async def verify_photo_integrity(photo_id: str) -> IntegrityResult:
"""Run periodically on random sample of photos."""
photo = await db.get_photo(photo_id)What to say
The system is designed so that data loss is nearly impossible, while feature degradation is handled gracefully. Users might see slower load times or delayed search, but they will never lose photos.
Evolution and Scaling
What to say
This design handles billions of photos. Let me discuss how it evolves for even larger scale and advanced features.
Scaling Evolution:
Stage 1: Single Region (up to 10M users) - Single object storage bucket - Single metadata database cluster - ML processing on shared GPU cluster
Stage 2: Multi-Region (up to 1B users) - Regional object storage with cross-region replication - Sharded metadata database by user_id - Regional ML processing, centralized model serving
Stage 3: Global (1B+ users) - Edge caching for thumbnails worldwide - Active-active metadata in each region - Distributed ML inference at edge for latency-sensitive features
| Scale | Bottleneck | Solution |
|---|---|---|
| 10M photos | Single DB | Read replicas |
| 1B photos | Storage costs | Tiered storage, deduplication |
| 100B photos | ML processing | Batched inference, edge ML |
| 1T photos | Search latency | Sharded search, approximate NN |
Cost Optimization at Scale:
Photo Lifecycle:
Day 0-30: Hot Storage (SSD-backed, CDN cached)Advanced Features Evolution:
| Feature | Complexity | Architecture Addition |
|---|---|---|
| Memories/Highlights | Medium | Background job analyzes patterns, creates collections |
| Shared Albums | Medium | Access control layer, cross-user queries |
| Print Products | Low | Integration with print partners API |
| Video Support | High | Transcoding pipeline, streaming infrastructure |
| Live Photos | High | Combined video+image storage, playback sync |
Alternative approach
If storage cost were the primary constraint (not Google scale), I would use more aggressive deduplication including cross-user dedup for public/stock photos, and offer quality tiers where users can choose storage vs quality tradeoff.
What I would do differently for...
Privacy-focused (like iCloud): Client-side encryption, zero-knowledge architecture. ML features run on-device only.
Social-first (like Instagram): Optimize for feed generation, focus on public sharing, real-time notifications.
Enterprise (like Dropbox): Emphasize sharing controls, audit logs, compliance features, team management.