System Design Masterclass
video-streamingcdntranscodingadaptive-bitrateblob-storageintermediate

Design Video Streaming Platform

Design a platform for uploading, storing, and streaming videos globally with adaptive quality

Billions of videos, petabytes of storage|Similar to YouTube, Netflix, TikTok, Twitch, Amazon Prime Video, Disney+|45 min read

Summary

A video streaming platform lets people upload videos, watch them from anywhere in the world, and automatically adjusts video quality based on internet speed. The hard parts are: processing videos into many quality levels (480p, 720p, 1080p, 4K), storing huge files cheaply, delivering videos fast to users worldwide using CDNs, and switching quality smoothly when internet speed changes. Companies like YouTube, Netflix, TikTok, and Twitch ask this question in interviews.

Key Takeaways

Core Problem

The main job is to let users upload videos, convert them into multiple quality levels, store them cheaply, and deliver them fast to viewers anywhere in the world.

The Hard Part

A 10-minute 4K video is about 3GB. We cannot send 3GB files to everyone - we need to break videos into small pieces and send only what the user needs based on their internet speed.

Scaling Axis

Videos are watched much more than uploaded (1000:1 ratio). We optimize for reading by using CDNs to cache popular videos close to users. Storage grows forever - old videos never go away.

Critical Invariant

Once a video is uploaded, it must never be lost. We keep multiple copies in different locations. Also, videos must not buffer - smooth playback is more important than highest quality.

Performance Requirement

Video must start playing within 2 seconds. Quality switches must happen without buffering. 99.9% of videos must be available at all times.

Key Tradeoff

We trade storage cost (keeping many quality versions) for user experience (fast loading, smooth playback). Popular videos are cached everywhere, rare videos are fetched from origin.

Design Walkthrough

Problem Statement

The Question: Design a video streaming platform like YouTube where people can upload videos and others can watch them from anywhere in the world with good quality.

What the platform needs to do (most important first):

  1. 1.Upload videos - Users can upload video files of any size. The system accepts the file and processes it.
  2. 2.Process videos - Convert uploaded videos into multiple quality levels (360p, 480p, 720p, 1080p, 4K) so users with slow internet can still watch.
  3. 3.Store videos - Keep all video files safe and organized. Videos should never be lost.
  4. 4.Stream videos - Send video to users in a way that plays smoothly without buffering. Adjust quality based on internet speed.
  5. 5.Deliver globally - Users in Japan, Brazil, and USA should all get fast video loading, not just users near our servers.
  6. 6.Show video details - Display title, description, view count, likes, and comments for each video.
  7. 7.Search and discover - Help users find videos by searching titles or browsing categories.

What to say first

Let me first understand what we are building. Are we designing for short videos like TikTok (under 1 minute) or long videos like YouTube (hours long)? Do we need live streaming or just pre-recorded videos? Once I know this, I will ask about scale - how many uploads per day and how many viewers.

What the interviewer really wants to see: - Do you understand why we need multiple video quality levels? - Can you explain how videos get from one server to users worldwide (CDN)? - Do you know how adaptive bitrate streaming works (quality changes based on internet speed)? - Can you handle the massive storage needs (petabytes of video data)?

Clarifying Questions

Before you start designing, ask questions to understand what you are building. Good questions show the interviewer you think before you code.

Question 1: What kind of videos?

Are these short videos (TikTok style, under 1 minute) or long videos (YouTube style, up to several hours)? Do we need to support live streaming too?

Why ask this: Short videos are easier - they fit in memory and process fast. Long videos need chunking (breaking into pieces) and more complex processing.

What interviewers usually say: Focus on pre-recorded videos up to 2 hours long. Live streaming can be discussed later.

How this changes your design: For long videos, we must break them into small chunks (usually 2-10 seconds each) so users can start watching before the whole file downloads.

Question 2: How big is this?

How many videos are uploaded per day? How many people watch videos at the same time? What is the average video length?

Why ask this: This tells us how much storage and processing power we need.

What interviewers usually say: 100,000 new videos per day, 10 million viewers watching at once, average video is 5 minutes.

How this changes your design: At this scale, we need distributed processing (many servers working together) and a global CDN (copies of videos in many countries).

Question 3: What quality levels?

What video quality levels do we need? Just 720p and 1080p, or also 4K? Do we need to support old phones with small screens?

Why ask this: Each quality level means more storage and processing. 4K files are 4x bigger than 1080p.

What interviewers usually say: Support 360p, 480p, 720p, 1080p, and 4K. Users should get the best quality their internet and device can handle.

How this changes your design: We need to transcode (convert) each video into 5+ versions. This multiplies storage needs by 5x or more.

Question 4: How fast should uploads be ready?

After someone uploads a video, how quickly should it be available to watch? Instantly, or is a few minutes okay?

Why ask this: Instant means we need very fast processing or we show a lower quality version first while processing the rest.

What interviewers usually say: Within 5-10 minutes for most videos. Popular creators might get priority processing.

How this changes your design: We can use a queue to process videos. No need for super expensive instant processing.

Summarize your assumptions

Let me summarize: Pre-recorded videos up to 2 hours, 100K uploads daily, 10M concurrent viewers, 5 quality levels (360p to 4K), videos ready within 10 minutes. I will focus on the upload, processing, storage, and streaming parts.

The Hard Part

Say this to the interviewer

The hardest part of video streaming is handling the massive data. A single 4K video that is 10 minutes long is about 3 gigabytes. If 100,000 videos are uploaded daily, that is 300 terabytes of new data every single day. We cannot just store these huge files and send them directly to users.

Why video streaming is tricky (explained simply):

  1. 1.Files are huge - One hour of 4K video is about 20GB. We cannot send 20GB to someone who just wants to watch the first minute.
  2. 2.Internet speeds vary - Someone on fast WiFi can handle 4K. Someone on a phone in a subway needs 360p. We need to give each person the right quality.
  3. 3.Global delivery - A viewer in Tokyo should not wait for data to travel from servers in New York. We need copies of videos closer to users.
  4. 4.Processing takes time - Converting a video to 5 quality levels is slow. A 1-hour video might take 2-3 hours to fully process.
  5. 5.Storage costs money - Keeping 5 versions of every video forever costs a lot. We need smart ways to save money.
  6. 6.Smooth playback - If quality suddenly drops, users get annoyed. Quality changes must be smooth and invisible.

Common mistake candidates make

Many people say: just store the original video and send it to users. This is wrong because: (1) a 4K file is way too big for most internet connections, (2) old phones cannot play 4K video at all, (3) sending from one location to the whole world is very slow.

The solution has three main parts:

Part 1: Chunking - Break videos into small pieces (usually 2-10 seconds each). This way: - Users can start watching immediately (no need to download the whole file) - We can change quality piece by piece - If one piece fails to load, only that small part is missing

Part 2: Transcoding - Convert each video into multiple quality levels: - 360p for very slow internet (0.5 Mbps) - 480p for slow internet (1 Mbps) - 720p for normal internet (3 Mbps) - 1080p for fast internet (6 Mbps) - 4K for very fast internet (25 Mbps)

Part 3: CDN (Content Delivery Network) - Copy popular videos to servers all over the world: - User in Tokyo gets video from a server in Tokyo - User in London gets video from a server in London - Only rare videos need to be fetched from the main storage

How video goes from upload to watching

Scale and Access Patterns

Before designing, let me figure out how big this system needs to be. This helps us choose the right tools.

What we are measuringNumberWhat this means for our design
Videos uploaded per day100,000About 1.2 uploads per second - manageable with a queue
Viewers watching at once10 millionNeed a big CDN to handle this traffic
+ 6 more rows...

What to tell the interviewer

This is extremely read-heavy - 1000 watches for every 1 upload. Our design should make watching fast and cheap. Uploads can be slower since users expect processing time. The biggest cost will be CDN bandwidth and storage, not database operations.

Common interview mistake: Underestimating storage

Many candidates forget that video storage grows forever. Unlike messages or logs that can be deleted, videos stay forever. After 5 years, you have 45 petabytes. This is why companies like YouTube use cold storage for old unpopular videos and hot storage only for popular recent videos.

How people use the platform (from most common to least common):

  1. 1.Watch a video - Someone clicks play and watches. This is 99% of all activity. Must be fast and smooth.
  2. 2.Browse and search - Looking for videos to watch. Needs fast search and good recommendations.
  3. 3.Upload a video - A creator uploads new content. Can take a few minutes, users expect this.
  4. 4.Interact - Like, comment, subscribe. These are quick database operations.
How much bandwidth do we need?

Viewers watching at once: 10 million
+ 17 more lines...

High-Level Architecture

Now let me draw the big picture of how all the pieces fit together. I will keep it simple and explain what each part does.

What to tell the interviewer

I will break this into separate services: one for handling uploads, one for video processing, one for metadata (titles, descriptions), and one for streaming. Video files go to blob storage and get distributed through a CDN. Each service does one job well.

Video Streaming Platform - The Big Picture

What each service does and WHY it is separate:

ServiceWhat it doesWhy it is separate (what to tell interviewer)
Upload ServiceReceives video files from users. Validates format and size. Saves original to blob storage. Puts job in queue for processing.Why separate? Uploads can be slow (large files) and we do not want them blocking other operations. Also, we can scale upload servers separately during busy times.
Video ServiceStores video metadata (title, description, views). Handles likes, comments, subscriptions. Powers search and recommendations.Why separate? Metadata operations are fast database queries. Video file operations are slow blob storage operations. Different performance needs.
Processing ServiceTakes videos from queue. Converts to multiple qualities. Breaks into chunks. Creates thumbnails. Updates metadata when done.Why separate? Processing is CPU-heavy and slow. We need many worker machines. If processing is slow, it should not affect watching or uploading.
Streaming ServiceFigures out which video chunks to send. Handles adaptive bitrate logic. Talks to CDN for delivery.Why separate? Streaming needs to be super fast and handle millions of requests per second. It has very different scaling needs than other services.

Common interview question: Why not one big service?

Interviewers often ask: Why so many services? Your answer: Each part has very different needs. Processing needs lots of CPU but is not time-sensitive. Streaming needs to be instant and handle massive traffic. Upload handles large files but is infrequent. Separating them lets us scale and optimize each part independently.

Technology Choices - Why we picked these tools:

Blob Storage: S3 or equivalent (Required) - Why we chose it: Built for storing huge files, very cheap for rarely accessed data, handles petabytes easily - Other options: Google Cloud Storage, Azure Blob - all work similarly - Key feature: Different storage classes (hot/warm/cold) to save money

Database: PostgreSQL (Recommended) - Why we chose it: Stores video metadata (title, views, likes), handles relationships well - Note: We only store metadata here, NOT the video files - Other options: MySQL works fine too

Cache: Redis (Recommended) - Why we chose it: Caches popular video metadata, reduces database load - What we cache: Video info for trending videos, view counts

Message Queue: Kafka or RabbitMQ - Why we need it: Processing jobs need to be queued and processed reliably - Kafka: Better for high volume, keeps message history - RabbitMQ: Simpler, good for smaller scale

CDN: CloudFront, Akamai, or Cloudflare - Why we need it: Delivers videos from servers close to users - This is NOT optional - without CDN, video streaming does not work at scale

Search: Elasticsearch - Why we chose it: Fast text search for video titles and descriptions

How real companies do it

YouTube uses a custom-built system with Bigtable for metadata and their own CDN. Netflix uses AWS with their own CDN called Open Connect. TikTok uses a mix of cloud providers. But for most companies, S3 + CloudFront + PostgreSQL works perfectly fine.

Data Model and Storage

Now let me show how we organize the data. We have two types of storage: a database for metadata (small, structured data) and blob storage for video files (huge, unstructured data).

What to tell the interviewer

I will use PostgreSQL for video metadata - titles, view counts, user info. Video files go to S3-style blob storage. This separation is important: database queries are fast for small data, blob storage is cheap for huge files.

Table 1: Videos - Information about each video (NOT the actual video file)

This stores metadata that we need to search and display. The actual video files are in blob storage.

ColumnWhat it storesExample
idUnique ID for this videovid_abc123
user_idWho uploaded ituser_456
+ 11 more rows...

Table 2: Video Files - Tracks all the processed versions of each video

One video becomes many files (different qualities). This table tracks all of them.

ColumnWhat it storesExample
idUnique ID for this filevf_789
video_idWhich video this belongs tovid_abc123
+ 7 more rows...

Database Index

We add an INDEX on (user_id, created_at) to quickly find all videos by a user. We also index (status) to find videos that are still processing.

Table 3: Users - Information about creators and viewers

ColumnWhat it storesExample
idUnique user IDuser_456
usernameDisplay namePizzaChef
+ 5 more rows...

Blob Storage Structure - How video files are organized in S3

We organize files in folders by video ID and quality. This makes it easy to find all versions of a video.

s3://video-platform/
├── uploads/                    # Original uploaded files
│   └── vid_abc123/
+ 25 more lines...

Important: Storage classes to save money

Not all videos are watched equally. Use HOT storage (fast, expensive) for videos watched in the last 30 days. Use COLD storage (slow, cheap) for old videos nobody watches. This can save 80% on storage costs. Moving videos between storage classes happens automatically based on access patterns.

Video Upload and Processing Deep Dive

Let me explain step by step what happens when someone uploads a video.

Step 1: Upload the original file

When a user uploads a video: 1. Client asks server for a place to upload (pre-signed URL) 2. Client uploads directly to blob storage (not through our servers) 3. Server is notified when upload is done 4. Video goes into the processing queue

Upload flow

Why upload directly to blob storage?

If users uploaded through our servers, we would need huge servers with lots of bandwidth. By giving users a direct upload URL to S3, we skip the middleman. This is called a pre-signed URL - it is a temporary permission to upload one file.

Step 2: Process the video (Transcoding)

A worker picks up the job from the queue and: 1. Downloads the original video 2. Creates multiple quality versions 3. Breaks each version into small chunks 4. Uploads all chunks to blob storage 5. Creates a manifest file (playlist) 6. Updates the database to mark video as ready

FUNCTION process_video(video_id):
    
    STEP 1: Get the original file
+ 40 more lines...

What is a manifest file?

The manifest file (usually called master.m3u8 for HLS format) is like a menu. It tells the video player: - What quality levels are available - Where to find the chunks for each quality - How long each chunk is

#EXTM3U
#EXT-X-VERSION:3
+ 30 more lines...

HLS vs DASH

There are two main formats for streaming: HLS (by Apple, uses .m3u8 files) and DASH (open standard, uses .mpd files). Both work similarly - they break video into chunks and have manifest files. HLS is more widely supported, so most companies use HLS. Some support both.

Adaptive Bitrate Streaming Deep Dive

What to tell the interviewer

The magic of video streaming is adaptive bitrate. The player measures internet speed and automatically switches quality. If your WiFi gets slow, it drops to 480p. When it gets fast again, it goes back to 1080p. This happens smoothly without you noticing.

How the video player works:

  1. 1.Player downloads the manifest file first 2. Player sees all available quality levels 3. Player starts with a low quality (fast to load) 4. While playing, player measures download speed 5. If internet is fast, switch to higher quality on next chunk 6. If internet is slow, switch to lower quality immediately 7. Keep monitoring and adjusting throughout playback

Adaptive bitrate in action

The buffer - why videos do not freeze

The player always stays ahead of what you are watching. If you are watching second 30, the player has already downloaded up to second 60. This 30-second buffer means: - Small internet hiccups do not cause freezing - There is time to switch quality if speed changes - Seeking (jumping to a new time) might need new downloads

FUNCTION choose_quality_for_next_chunk():
    
    // Measure how fast the last few downloads were
+ 42 more lines...

Why chunk size matters

Chunk size is a tradeoff. Short chunks (2 seconds) mean faster quality switches but more HTTP requests. Long chunks (10 seconds) mean fewer requests but slower adaptation. Most platforms use 6-10 second chunks. Netflix uses 4 seconds. YouTube uses 5 seconds.

CDN and Global Delivery

This is the most important part

Without a CDN, video streaming does not work at scale. A CDN is a network of servers all over the world that cache copies of your videos. When someone in Tokyo watches a video, they get it from a server in Tokyo, not from your main servers in Virginia.

Why we need a CDN:

  1. 1.Speed - Data travels at the speed of light, but that is still slow over long distances. Tokyo to Virginia is 100+ milliseconds. Tokyo to Tokyo is 10 milliseconds.
  2. 2.Bandwidth - Sending video to 10 million users from one location would need impossible amounts of bandwidth. CDN spreads the load.
  3. 3.Reliability - If one server goes down, users are routed to another. No single point of failure.
  4. 4.Cost - CDN bandwidth is cheaper than origin bandwidth because CDN companies buy in bulk.

How CDN delivers videos

How CDN caching works:

Not every video is on every CDN server. That would need too much storage. Instead:

  1. 1.First request - User in Tokyo asks for a video. Tokyo server does not have it. Tokyo server fetches from origin, stores a copy, sends to user.
  2. 2.Second request - Another user in Tokyo asks for same video. Tokyo server already has it. Sends immediately. No origin fetch needed.
  3. 3.Cache expiry - After some time (hours or days), the cache copy is deleted to make room for newer popular videos.
  4. 4.Cache warming - For videos we KNOW will be popular (like a new movie release), we push copies to all CDN servers BEFORE users ask.
FUNCTION cdn_server_handle_request(video_chunk_url):
    
    // Check if we have this chunk in our local cache
+ 25 more lines...

CDN hit rate

A good CDN setup has 95%+ hit rate - meaning 95% of requests are served from cache without touching origin. The remaining 5% are either first-time requests or unpopular videos. Higher hit rate = lower costs and faster delivery.

Popular CDN providers:

  • CloudFront (AWS) - Good integration with S3, pay per use - Akamai - Oldest and largest CDN, used by many big companies - Cloudflare - Good free tier, easy to set up - Fastly - Very fast cache invalidation - Netflix Open Connect - Netflix built their own CDN and even puts servers inside ISPs!

For most companies, CloudFront or Cloudflare works great.

What Can Go Wrong and How We Handle It

Tell the interviewer about failures

Good engineers think about what can break. Let me walk through the things that can go wrong with video streaming and how we protect against them.

Common failures and how we handle them:

What breaksWhat happens to usersHow we fix itWhy this works
CDN server goes downSome users cannot load videosCDN automatically routes to next closest serverCDN has hundreds of servers, losing one is fine
Origin storage goes downNew videos cannot load, cached videos still workKeep 3 copies of every file in different locationsMulti-region replication means no single point of failure
+ 4 more rows...

Making uploads reliable

Large file uploads often fail. A 2GB upload over slow internet might timeout. We handle this with resumable uploads:

FUNCTION upload_large_video(file):
    
    // Break file into small pieces (5MB each)
+ 30 more lines...

Making sure videos are never lost

Videos are the most important data. If someone deletes their only copy of a home video and we lose it, that is terrible. We protect against this:

VIDEO STORAGE SAFETY:

1. THREE COPIES MINIMUM
+ 19 more lines...

What is idempotent processing?

If a processing job fails halfway and restarts, it should not create duplicate chunks or corrupt the video. We make processing idempotent - running it twice gives the same result as running it once. We do this by using deterministic file names (chunk_001.ts, not chunk_random123.ts) so re-running overwrites the same files.

Optimizing Costs

What to tell the interviewer

Video streaming is expensive. The main costs are storage (keeping videos forever) and bandwidth (sending videos to users). Let me explain strategies to reduce costs without hurting user experience.

Where the money goes:

  1. 1.CDN Bandwidth (50-70% of cost) - Every byte sent to users costs money 2. Storage (20-30% of cost) - Videos take up space forever 3. Processing (5-10% of cost) - Converting videos uses CPU 4. Servers (5-10% of cost) - Running the API and services

Cost saving strategies:

1. Use efficient video codecs - H.265 (HEVC) is 50% smaller than H.264 at same quality - VP9 is similar to H.265 and free (no license fees) - AV1 is newest, 30% smaller than H.265, but slow to encode - Tradeoff: Newer codecs need more CPU to play, old devices cannot play them

CodecSize for 1 hour 1080pDevice supportBest for
H.2644 GBEverythingMaximum compatibility
H.2652 GBMost devices since 2015Good balance
VP92 GBChrome, Android, Smart TVsFree alternative to H.265
AV11.4 GBNewest devices onlyFuture standard

2. Smart storage tiers

  • Hot storage: Recently uploaded and popular videos (fast, expensive) - Warm storage: Videos accessed occasionally (medium speed, medium cost) - Cold storage: Old videos nobody watches (slow, very cheap)

Automatically move videos between tiers based on access patterns.

STORAGE TIER RULES:

HOT STORAGE (S3 Standard): $0.023 per GB/month
+ 20 more lines...

3. Do not transcode what is not needed

  • If original is 720p, do not create 1080p or 4K versions - If video is 10 seconds long, maybe only create 2 quality levels - If video has very few views, consider deleting some quality versions

4. Per-title encoding

A cartoon and a sports game need different bitrates. Cartoons have simple colors and can use lower bitrate. Sports have fast motion and need higher bitrate. Smart encoding analyzes each video and picks optimal settings.

How Netflix does it

Netflix analyzes each video and creates a custom encoding profile. A simple animated show might look great at 720p with 2 Mbps. An action movie might need 720p with 4 Mbps. This approach (called per-title encoding) saved Netflix 20% on bandwidth.

Growing the System Over Time

What to tell the interviewer

This design handles millions of users. Let me explain how we would grow it for different scales and what features we could add later.

How we grow step by step:

Stage 1: Starting out (up to 1 million users) - Single region deployment - One CDN provider - Basic transcoding (3 quality levels) - This handles 10,000 concurrent viewers easily

Stage 2: Growing (1-10 million users) - Multi-region for metadata (database replicas) - Multiple CDN providers for redundancy - More quality levels and codec options - Video recommendations based on watch history

Stage 3: Large scale (10-100 million users) - CDN edge computing for manifest generation - Real-time analytics for trending videos - Machine learning for per-title encoding - Multiple origin storage locations

Stage 4: Massive scale (100M+ users, like YouTube) - Custom CDN with servers inside ISPs - AI-powered content moderation - Live streaming support - Advanced DRM for premium content

Features we can add later:

1. Live streaming

Live streaming is different from video-on-demand: - No time to pre-process - must encode in real-time - Chunks are created as the stream happens - Latency matters - viewers want to see events as they happen - Need to handle sudden viewer spikes (everyone watches at once)

Live streaming flow

2. DRM (Digital Rights Management)

For paid content (movies, TV shows), we need to prevent piracy: - Encrypt video chunks - Only give decryption keys to authorized users - Keys expire after playback session ends - Different DRM systems: Widevine (Android/Chrome), FairPlay (Apple), PlayReady (Microsoft)

3. Content moderation

At scale, we need to automatically detect: - Copyright violations (someone uploading a movie) - Inappropriate content - Spam and scams

This uses machine learning to scan videos and flag problems before humans review.

4. Analytics and recommendations

Track what users watch to: - Recommend similar videos - Show trending content - Help creators understand their audience - Optimize ad placement

What about video chat (like Zoom)?

Video chat is very different from video streaming. Chat needs ultra-low latency (under 200ms) so people can have conversations. It uses WebRTC with peer-to-peer connections when possible. The architecture is completely different - worth mentioning that it is a separate system design problem.

Design Trade-offs

Advantages

  • +Uses least storage
  • +Always have original quality
  • +Can add new formats later

Disadvantages

  • -Slow first play for each quality level
  • -High CPU cost during playback
  • -Cannot handle many concurrent viewers
When to use

Only for very small platforms with few viewers. Not recommended for any real scale.