Design Walkthrough
Problem Statement
The Question: Design a URL shortening service like bit.ly where users can paste a long link and get a short one back. When someone visits the short link, they get sent to the original long link.
What the service needs to do (most important first):
- 1.Shorten a URL - User gives us a long link like https://amazon.com/very/long/product/page/12345. We give back something short like bit.ly/x7Kp2m.
- 2.Redirect quickly - When someone clicks bit.ly/x7Kp2m, we send them to the original Amazon link in under 100 milliseconds.
- 3.Handle lots of traffic - Popular links might get clicked millions of times. The service must not slow down.
- 4.Track clicks - Count how many people clicked each link, when they clicked, and where they clicked from.
- 5.Custom short links - Let users pick their own short code like bit.ly/my-sale instead of a random code.
- 6.Link expiration - Some links should stop working after a set time (like a 24-hour sale link).
What to say first
Let me understand what we are building. Do we need custom short codes or just random ones? Do links expire or last forever? Do we need click tracking? Once I know the features, I will ask about scale - how many URLs and how many clicks per day.
What the interviewer really wants to see: - Can you generate unique short codes without duplicates, even with multiple servers? - Do you understand that reads (redirects) happen way more than writes (creating links)? - Can you design a system that responds in under 100 milliseconds? - How do you handle a viral link that suddenly gets millions of clicks?
Clarifying Questions
Before you start designing, ask questions to understand what you are building. Good questions show the interviewer you think before you code.
Question 1: How big is this?
How many new short URLs do we create per day? How many redirects happen per day? This tells me if we need one server or thousands.
Why ask this: The design for 1,000 URLs per day is very different from 100 million per day.
What interviewers usually say: 100 million new URLs per day, 10 billion redirects per day. Redirects happen 100 times more than URL creation.
How this changes your design: Since redirects happen 100x more, we must make redirects super fast using caching. Creating URLs can be a bit slower.
Question 2: How short should the URL be?
Should the short code be 6 characters, 7 characters, or longer? Shorter is easier to type but we run out of combinations faster.
Why ask this: A 6-character code using letters and numbers (a-z, A-Z, 0-9) gives us 62^6 = 56 billion combinations. A 7-character code gives us 62^7 = 3.5 trillion combinations.
What interviewers usually say: Start with 7 characters. This gives us enough combinations for many years.
How this changes your design: With 100 million new URLs per day, 7 characters will last us about 95 years. We are safe.
Question 3: Do links expire?
Should short links work forever, or should they expire after some time? Can users set their own expiration?
Why ask this: If links never expire, our database grows forever. If they expire, we can reuse old short codes and delete old data.
What interviewers usually say: Links last forever by default, but users can set an expiration if they want.
How this changes your design: We need to store a created_at and expires_at time for each link. A background job can clean up expired links.
Question 4: Do we need analytics?
Should we track how many times each link was clicked? Do we need to know when and where people clicked?
Why ask this: Analytics adds complexity. We need to store every single click, which is way more data than just the URLs.
What interviewers usually say: Yes, track total clicks. Nice to have: clicks by day, by country, by device.
How this changes your design: We cannot update the database on every click (too slow). We need to batch the updates or use a separate analytics system.
Summarize your assumptions
Let me summarize: 100 million new URLs per day, 10 billion redirects per day, 7-character codes, links last forever by default, and we need basic click tracking. Redirects must be under 100 milliseconds.
The Hard Part
Say this to the interviewer
The hardest part of a URL shortener is generating unique short codes. If we have 10 servers all creating URLs at the same time, how do we make sure they never create the same short code? Even one duplicate would break the system.
Why unique IDs are tricky (explained simply):
- 1.Many servers at once - If 10 servers are creating URLs at the same time, they might accidentally pick the same short code.
- 2.Must be fast - We cannot check the database every time to see if a code is taken. That would be too slow.
- 3.Must never repeat - If bit.ly/abc123 goes to Site A, it can never go to Site B. Ever. People have shared that link.
- 4.Codes should look random - We do not want bit.ly/1, bit.ly/2, bit.ly/3. People could guess URLs and find private links.
- 5.Need billions of them - At 100 million URLs per day, we need 36 billion codes per year. We cannot run out.
Common mistake candidates make
Many people say: just use a random string and check if it exists in the database. This is wrong because: (1) checking the database every time is slow, (2) random collisions become more likely as the database fills up, (3) with multiple servers, two might check at the same time and both think the code is free.
Three ways to generate unique short codes:
Option 1: Counter with Base62 Encoding (Recommended) - Keep a counter that goes up: 1, 2, 3, 4... - Convert each number to letters and numbers (Base62) - Number 1 becomes "1", number 62 becomes "10", number 1000000 becomes "4c92" - Why this is good: Guaranteed unique, very fast, no database check needed - How to share the counter: Give each server a range (Server 1 gets 1-1000000, Server 2 gets 1000001-2000000)
Option 2: Hash the Long URL - Use a hash function (like MD5) on the long URL - Take the first 7 characters of the hash - Why this is tricky: Different URLs might have the same first 7 hash characters (collision). Need to handle this.
Option 3: Pre-generate Codes - Create millions of random codes ahead of time and store them in a table - When someone needs a short URL, grab one from the table and mark it as used - Why this works: No collision possible because each code is used only once
How we generate unique codes
What is Base62?
- We use 62 characters: a-z (26) + A-Z (26) + 0-9 (10) = 62
- Just like Base10 uses 0-9, Base62 uses more charactersScale and Access Patterns
Before designing, let me figure out how big this system needs to be. This helps us choose the right tools.
| What we are measuring | Number | What this means for our design |
|---|---|---|
| New URLs per day | 100 million | About 1,160 writes per second - one database handles this easily |
| Redirects per day | 10 billion | About 115,000 reads per second - need heavy caching |
What to tell the interviewer
This is a read-heavy system with 100:1 read to write ratio. Our main focus should be making redirects super fast using caching. At 115,000 redirects per second, we need Redis cache in front of our database. A cache hit should happen 99% of the time.
Common interview mistake: Ignoring the read-heavy pattern
Many candidates focus on making URL creation fast. But redirects happen 100x more! A slow redirect (even 500ms) would make users angry. A slow URL creation (even 2 seconds) is fine - users only do it once.
How people use the service (from most common to least common):
- 1.Click a short link (redirect) - This is 99% of all traffic. Someone clicks bit.ly/abc123 and goes to the original site. Must be super fast.
- 2.Create a short link - User pastes a long URL and gets a short one back. Happens 100x less than redirects.
- 3.View analytics - User checks how many clicks their link got. Happens rarely.
- 4.Delete a link - User removes a link they created. Very rare.
How much space does one short URL need?
- Short code: 7 bytes
- Long URL: 200 bytes average (some are longer)High-Level Architecture
Now let me draw the big picture of how all the pieces fit together. I will keep it simple and explain what each part does.
What to tell the interviewer
I will separate the system into two main paths: creating URLs and redirecting. Redirects need to be super fast, so they go through a cache first. URL creation can be a bit slower since it happens 100x less often.
URL Shortener - The Big Picture
What each part does and WHY it is separate:
| Part | What it does | Why it is separate (what to tell interviewer) |
|---|---|---|
| Redirect Service | Looks up short code, sends user to long URL | This is the hot path - 99% of traffic. Must be super fast. Talks to cache first, database only if cache misses. |
| URL Service | Creates new short URLs, handles custom codes | Separate from redirects because creating URLs is slower and less frequent. We do not want slow URL creation to affect fast redirects. |
| Analytics Service | Counts clicks, stores time and location data | Why separate? We cannot update the database on every click - too slow. Analytics collects clicks in batches and writes them later. |
| Redis Cache | Stores recently used short code to long URL mappings | Why needed? Database lookups take 5-10 milliseconds. Cache lookups take 0.5 milliseconds. At 115K redirects per second, we need 99% to hit cache. |
| Zookeeper | Gives each server a range of IDs to use | Why needed? If Server 1 and Server 2 both try to use ID 1000, we get a duplicate. Zookeeper gives Server 1 IDs 1-1M and Server 2 IDs 1M-2M. |
Common interview question: Why not hash the URL?
Interviewers often ask: Why not just hash the long URL to get the short code? Your answer: Hashing works but has problems: (1) Two different URLs might have the same hash prefix (collision), (2) We need extra logic to handle collisions, (3) Same URL from different users would get the same short code - what if one user wants to delete it? Counter-based approach avoids all these problems.
Technology Choices - Why we picked these tools:
Database: PostgreSQL (Recommended) - Why we chose it: Great at storing key-value data (short code to long URL), handles our scale easily, supports good indexes - Other options we considered: - MySQL: Also works great - pick what your team knows - DynamoDB: Good for key-value lookups, but harder to do analytics queries - Cassandra: Good if we need to scale beyond 100TB, but adds complexity
Cache: Redis (Recommended) - Why we chose it: Super fast (0.5ms lookups), handles 100K+ operations per second, perfect for our read-heavy workload - Other options we considered: - Memcached: Also works, but Redis has more features - Local in-memory cache: Only works for small systems
ID Generation: Zookeeper or Database Sequence - Why we need it: Multiple servers need unique IDs without talking to each other - Zookeeper: Gives each server a range of IDs to use - Database Sequence: PostgreSQL can auto-increment IDs (simpler but slower)
Important interview tip
Pick technologies YOU know! If you have used MySQL at your job, use MySQL. If you know MongoDB, explain how it would work here. Interviewers care more about your reasoning than the specific tool.
Data Model and Storage
Now let me show how we organize the data in the database. We need two main tables: one for URLs and one for click tracking.
What to tell the interviewer
I will use a SQL database with two main tables: urls (stores the short code and long URL) and clicks (stores analytics data). The short code is the primary key for fast lookups.
Table 1: URLs - Stores the mapping from short code to long URL
This is the main table. When someone clicks a short link, we look up the short code here and find the long URL.
| Column | What it stores | Example |
|---|---|---|
| short_code | The 7-character code (PRIMARY KEY) | x7Kp2mQ |
| long_url | The original long URL | https://amazon.com/very/long/path |
Database Index
The short_code is the PRIMARY KEY, so lookups are super fast. We also add an INDEX on user_id so users can see all their links quickly.
Table 2: Clicks - Stores every click for analytics
Every time someone clicks a short link, we record it here. This table gets huge (billions of rows), so we store it separately.
| Column | What it stores | Example |
|---|---|---|
| id | Unique click ID | click_789 |
| short_code | Which link was clicked | x7Kp2mQ |
Important: Do NOT write clicks directly
We get 115,000 clicks per second. Writing each click to the database one by one would kill the database. Instead, we batch them: collect 1000 clicks in memory, then write them all at once. Or use a time-series database like TimescaleDB that is built for this.
How we handle the click count:
The urls table has a click_count column. We do NOT update it on every click (too slow). Instead:
- 1.Every click goes to a counter in Redis (super fast) 2. Every 5 minutes, a background job reads Redis counters 3. The job updates the click_count in PostgreSQL in batches 4. This way, the count might be 5 minutes behind, but that is okay for analytics
CREATE TABLE urls (
short_code VARCHAR(10) PRIMARY KEY,
long_url TEXT NOT NULL,The Redirect Flow
This is the most important part of the system. When someone clicks bit.ly/x7Kp2mQ, we need to send them to the right place in under 100 milliseconds.
What to tell the interviewer
The redirect flow is our hot path - 99% of all traffic. It must be blazing fast. We check the cache first (0.5ms), only go to the database if cache misses (5-10ms). We also record the click for analytics but do not wait for it.
What happens when someone clicks a short link
FUNCTION handle_redirect(short_code):
STEP 1: Check the cache first (super fast - 0.5ms)HTTP 301 vs HTTP 302 - Which redirect to use?
HTTP 301 (Moved Permanently) - Tells the browser: "Remember this! Next time, go directly to the long URL." - Good for: SEO (search engines pass link value to the original site) - Bad for: Analytics (browser might skip us next time)
HTTP 302 (Temporary Redirect) - Tells the browser: "Go there this time, but ask me again next time." - Good for: Analytics (we see every click) - Bad for: SEO (search engines do not pass full link value)
Our choice: Use 301 by default (better for users), but let users pick 302 if they need accurate click counts.
Why we do NOT wait for analytics
In Step 4, we send click data to a queue and do NOT wait. Why? Writing to the database takes time. If we waited, redirects would be slower. Instead, we put the click data in a fast queue (like Kafka) and let the analytics service process it later. Users do not notice a few seconds delay in analytics.
Making it even faster with caching strategy:
- 1.Cache everything that gets clicked - After a database lookup, always save to Redis
- 2.Pre-warm popular links - When a link goes viral, make sure all Redis servers have it
- 3.Use local cache too - Each server can keep the top 10,000 links in memory (even faster than Redis)
- 4.Set smart expiration - Popular links stay in cache longer (24 hours), rarely clicked links expire faster (1 hour)
Creating Short URLs
When a user gives us a long URL, we need to create a unique short code and save the mapping. This happens 100x less than redirects, so it can be a bit slower.
What to tell the interviewer
For creating URLs, the key challenge is generating unique short codes across multiple servers. I will use a counter-based approach where each server gets a range of IDs from Zookeeper. This guarantees uniqueness without checking the database.
How we create a new short URL
FUNCTION create_short_url(long_url, user_id, custom_code = null):
STEP 1: Validate the long URLHow the counter works across multiple servers:
Problem: We have 10 servers creating URLs. How do we make sure they never create the same short code?
Solution: Give each server its own range of numbers.
- 1.Server 1 starts up and asks Zookeeper: "Give me some IDs" 2. Zookeeper says: "You get IDs 1 to 1,000,000" 3. Server 1 uses ID 1, then 2, then 3... up to 1,000,000 4. Server 2 asks Zookeeper and gets: "You get IDs 1,000,001 to 2,000,000" 5. When Server 1 runs out, it asks for another range
This way, no two servers ever use the same ID.
When Server starts up:
range_start, range_end = ZOOKEEPER.get_id_range(size = 1000000)
current_id = range_startWhy not use database auto-increment?
PostgreSQL can auto-increment IDs, but it becomes a bottleneck. Every URL creation would need to talk to the database to get the next ID. With Zookeeper ranges, each server can create 1 million URLs without talking to anyone. Much faster.
Analytics and Click Tracking
Users want to know: How many people clicked my link? When did they click? Where are they from? But with 115,000 clicks per second, we cannot write each click to the database immediately.
The problem with real-time analytics
If we tried to INSERT INTO clicks for every click, the database would die. 115,000 inserts per second is way too many. Instead, we collect clicks in batches and write them together.
How we track clicks without killing the database
Two-part analytics system:
Part 1: Fast Counter (for total clicks) - Every click increments a Redis counter - Redis handles 100,000+ increments per second easily - Every 5 minutes, we copy Redis counts to PostgreSQL - Users see total clicks that are at most 5 minutes old
Part 2: Detailed Logging (for who, when, where) - Every click goes to a Kafka queue - Analytics workers pull from Kafka in batches of 1000 - Workers write batches to TimescaleDB (time-series database) - Users can see detailed analytics a few minutes later
FUNCTION record_click(short_code, request_info):
// This runs ASYNC - we do not wait for it
FUNCTION analytics_worker():
// This runs continuously in the background
What is TimescaleDB?
TimescaleDB is PostgreSQL with special features for time-series data (data that has timestamps). It automatically splits data by time (last hour, last day, last week) so queries like show me clicks for the last 7 days are super fast. Perfect for click analytics.
What analytics we show to users:
- 1.Total clicks - From the click_count in the urls table (5 min delay max) 2. Clicks over time - Chart showing clicks per hour/day from TimescaleDB 3. Top countries - Group clicks by country 4. Top referrers - Where did clicks come from (Twitter, Facebook, etc.) 5. Device breakdown - Mobile vs Desktop vs Tablet
What Can Go Wrong and How We Handle It
Tell the interviewer about failures
Good engineers think about what can break. Let me walk through the things that can go wrong and how we protect against them.
| What breaks | What happens to users | How we fix it | Why this works |
|---|---|---|---|
| Redis cache goes down | Redirects become slow (hit database) | Keep database read replicas + auto-restart Redis | Slow is better than broken. Database can handle some load. |
| Database goes down | Cannot create new URLs, redirects fail on cache miss | Use database replicas + failover | Read replica becomes primary. Recent URLs are in cache. |
Handling a viral link:
Imagine a celebrity tweets a short link and suddenly 10 million people click it in 1 minute. How do we survive?
- 1.Cache is king - The link is already in Redis cache. All 10 million requests hit cache. 2. No database pressure - Database only saw 1 request (when we first loaded the link to cache) 3. Analytics handles it - Clicks go to Kafka queue. We process them as fast as we can. If we fall behind, that is okay - Kafka stores them. 4. Rate limit if needed - If cache cannot handle it, we can return cached result from CDN edge servers
FUNCTION create_url_with_rate_limit(long_url, user_id, ip_address):
STEP 1: Check rate limit for this IPChecking for malware and spam
Before shortening any URL, we check it against Google Safe Browsing API. This tells us if the URL is known malware, phishing, or spam. If it is bad, we refuse to shorten it. We also block certain domains entirely (like known spam sites).
Growing the System Over Time
What to tell the interviewer
This design works great for up to 100 million URLs per day. Let me explain how we would grow it if we need to support even more traffic or users around the world.
How we grow step by step:
Stage 1: Starting out (up to 10 million URLs per day) - One PostgreSQL database - One Redis cluster - A few application servers - This handles most companies needs
Stage 2: Growing fast (10-100 million URLs per day) - Add PostgreSQL read replicas - Add more Redis nodes (cluster mode) - Add more application servers behind load balancer - Add CDN for static content
Stage 3: Global scale (100 million+ URLs per day) - Multiple data centers (US, Europe, Asia) - Database replication across regions - Route users to nearest data center - This is what bit.ly and TinyURL do
Multi-region setup for global users
Cool features we can add later:
1. Link previews - When someone shares a short link on Twitter or Slack, show a preview of where it goes - Fetch the title and image from the original page - Store this metadata with the URL
2. QR codes - Generate a QR code for each short link - People can scan it with their phone instead of typing - Good for printed materials like posters and business cards
3. Password-protected links - Let users set a password on a link - Visitors must enter the password to see the destination - Good for private content
4. A/B testing - One short link can go to different pages for different users - 50% go to Page A, 50% go to Page B - Useful for marketing tests
5. Link editing - Let users change where a short link goes - Useful if the original page moves - But be careful - this could be abused
Security consideration for link editing
If users can change where a link goes, bad actors could: (1) Share a safe link, (2) Wait for people to trust it, (3) Change it to a malware site. Solution: Only allow editing within first 24 hours, or require re-verification when changing to a different domain.
Different types of URL shorteners need different focus:
Public shortener (like bit.ly): Focus on speed, analytics, and preventing abuse. Anyone can create links.
Enterprise shortener (internal company links): Focus on access control, integration with company systems, and audit logs.
Marketing shortener (like Rebrandly): Focus on custom domains, detailed analytics, and campaign tracking.
Social media shortener (like Twitter t.co): Focus on safety scanning, preview generation, and extremely high traffic.