Design Walkthrough
Problem Statement
The Question: Design a peer-to-peer file sharing system that can distribute large files to millions of users without relying on expensive centralized infrastructure.
Why P2P matters: - Cost efficiency: Bandwidth cost shared across all peers, not borne by single server - Scalability: More downloaders = more uploaders = faster downloads for everyone - Resilience: No single point of failure, works even if original seeder goes offline - Geographic distribution: Peers download from nearby peers, reducing latency
What to say first
Before I design, let me clarify: are we optimizing for maximum throughput, minimal infrastructure cost, or resilience? Also, what is the typical file size and how many concurrent peers do we expect?
Real-world applications: - Software distribution: Windows updates, game patches (Blizzard uses P2P) - Video streaming: Some live streaming platforms use P2P for edge delivery - Content delivery: Spotify used P2P for music caching - Decentralized storage: IPFS, Filecoin build on these concepts
Scale context
BitTorrent at peak handled 150+ million monthly active users, transferring petabytes of data daily. A single popular torrent can have millions of peers in its swarm.
Clarifying Questions
Ask these questions to demonstrate understanding of P2P system tradeoffs.
Question 1: Trust Model
Can we trust any of the peers? Or must we assume all peers are potentially malicious?
Why this matters: Determines verification requirements. Typical answer: Assume zero trust - any peer could send corrupted data Architecture impact: Must verify every piece with cryptographic hashes
Question 2: Peer Discovery
Is a centralized tracker acceptable, or must the system be fully decentralized?
Why this matters: Affects complexity and resilience. Typical answer: Start with tracker, evolve to DHT for resilience Architecture impact: Tracker is simpler; DHT handles tracker failures
Question 3: NAT Handling
How many peers are behind NAT/firewalls? Do we need to support peer-to-peer connections between NATted hosts?
Why this matters: Most home users are behind NAT. Typical answer: 70%+ of peers are behind NAT Architecture impact: Need hole punching, relay servers, or accept reduced connectivity
Question 4: Incentive Model
How do we prevent free-riding (downloading without uploading)?
Why this matters: Without incentives, rational users only download. Typical answer: Use tit-for-tat - prioritize peers who upload to you Architecture impact: Need upload tracking and choking algorithm
Stating assumptions
I will assume: zero-trust peers, tracker plus DHT for discovery, most peers behind NAT, tit-for-tat incentives. Files are large (100MB to 10GB), millions of concurrent peers globally.
The Hard Part
Say this out loud
The hard part here is coordinating millions of untrusted peers to cooperate efficiently while handling NAT traversal, incentivizing uploads, and ensuring every byte is verified.
Why this is genuinely hard:
- 1.Byzantine Environment: Any peer could be malicious - sending wrong data, lying about what pieces they have, or trying to poison the swarm
- 2.NAT Traversal: Most peers cannot accept incoming connections. Two peers both behind NAT cannot directly connect without help
- 3.Free-rider Problem: Rational users want to download without uploading. Without proper incentives, the system collapses
- 4.Piece Selection: Which pieces to download first matters enormously for swarm health
- 5.Peer Discovery at Scale: Finding peers efficiently among millions without overloading any central system
Common mistake
Candidates often forget about NAT traversal or assume all peers can connect to each other. In reality, two peers both behind NAT cannot connect without relay or hole punching.
The elegant insight of BitTorrent:
The system turns the scalability problem on its head:
Traditional: More users = more server load = slower
P2P: More users = more upload capacity = fasterEach downloader becomes an uploader. The most popular content becomes the fastest to download, not the slowest.
Scale and Access Patterns
Let me estimate the scale and understand access patterns.
| Dimension | Value | Impact |
|---|---|---|
| Concurrent peers | 1-10 million per popular file | Need efficient peer discovery |
| File size | 100MB to 50GB typical | Must chunk into pieces |
What to say
The interesting property of P2P is inverse scaling - a file with 1 million downloaders has potentially 1 million uploaders. Total bandwidth scales with demand.
Popular file distribution:
- File size: 4 GB
- Piece size: 4 MB -> 1,000 piecesAccess Pattern Analysis:
- Write once, read many: File is created once, downloaded millions of times - Chunked access: Peers request individual pieces, not whole file - Random access pattern: Rarest-first means random piece requests - Locality matters: Peers prefer nearby peers for lower latency - Time-varying: New releases spike, then taper off
High-Level Architecture
Let me walk through the core components of a P2P file sharing system.
What to say
The architecture has three main layers: content identification (what), peer discovery (who has it), and data transfer (getting it). I will start with a tracker-based design and show how DHT provides resilience.
P2P File Sharing Architecture
Component Responsibilities:
1. Torrent File / Magnet Link - Contains metadata: file name, size, piece hashes, tracker URLs - Info hash uniquely identifies the content (SHA-1 of metadata) - Magnet link is just the info hash - metadata fetched from peers
2. Tracker - Maintains list of peers for each torrent (by info hash) - Peers announce themselves periodically - Returns random subset of peers to requesting client - Simple HTTP or UDP protocol
3. DHT (Distributed Hash Table) - Decentralized peer discovery - Peers store and lookup info_hash -> peer_list mappings - No single point of failure - Uses Kademlia protocol in BitTorrent
4. Peer Exchange (PEX) - Connected peers share their peer lists with each other - Reduces load on tracker/DHT - Helps discover peers organically
Real-world reference
BitTorrent uses all three discovery methods simultaneously. Tracker for initial peers, DHT for resilience, PEX for efficiency. Modern clients work even if tracker is down.
Data Model and Storage
The data model centers around content addressing and piece verification.
What to say
The torrent file is the source of truth for content integrity. It contains hashes of every piece, allowing verification without trusting any peer.
torrent_metadata = {
"info": {
"name": "ubuntu-22.04.iso",Piece Hashing for Integrity:
import hashlib
class PieceManager:Tracker Database Schema:
-- Torrents being tracked
CREATE TABLE torrents (
info_hash BYTEA PRIMARY KEY, -- 20-byte SHA-1Important detail
Tracker only stores peer locations, not file content. Even if tracker is compromised, file integrity is guaranteed by piece hashes in the torrent file.
Algorithm Deep Dive
Three algorithms are critical to P2P performance: piece selection, peer selection (choking), and DHT routing.
1. Piece Selection: Rarest First
Which piece should a peer download next? This choice dramatically affects swarm health.
class PieceSelector:
def __init__(self, num_pieces: int):
self.num_pieces = num_piecesWhy rarest first?
If everyone downloads common pieces first, rare pieces become bottlenecks. Rarest-first ensures every piece has multiple sources, maximizing swarm resilience and speed.
2. Choking Algorithm: Tit-for-Tat
How do we decide which peers to upload to? This is the incentive mechanism.
class ChokingManager:
"""
Implements BitTorrent choking algorithm:Why optimistic unchoke?
New peers have no upload history yet. Optimistic unchoke gives them a chance to prove themselves. Without it, new peers could never bootstrap into the swarm.
3. DHT (Kademlia) for Decentralized Discovery
Kademlia DHT Routing
class KademliaDHT:
"""
Simplified Kademlia DHT implementation.Consistency and Invariants
System Invariants
Every piece MUST be verified against its SHA-1 hash before acceptance. Never write unverified data to disk or share it with other peers.
Trust Model - Zero Trust:
P2P systems operate in a Byzantine environment. Any peer could: - Send corrupted data (intentionally or due to bugs) - Lie about which pieces they have - Try to waste your bandwidth - Attempt to poison the swarm with bad data
| Threat | Mitigation | Why It Works |
|---|---|---|
| Corrupted piece | SHA-1 hash verification | Cryptographically secure - cannot fake |
| Fake bitfield | Verify on request failure | Disconnect peers who lie about having pieces |
| Bandwidth waste | Request pipelining limits | Cap outstanding requests per peer |
| Sybil attack | Proof of work or reputation | Expensive to create many fake identities |
| Eclipse attack | Multiple discovery methods | Tracker + DHT + PEX makes isolation hard |
Important detail
SHA-1 has known weaknesses for collision attacks, but remains secure for piece verification where attacker cannot control the original content. Modern protocols use SHA-256.
Consistency Guarantees:
- File integrity: Guaranteed by piece hashes - mathematically certain - Piece availability: Eventually consistent - takes time for pieces to spread - Peer lists: Weakly consistent - stale data is common and acceptable - Upload/download stats: Best effort - used for incentives, not billing
class PeerManager:
def __init__(self):
self.peer_strikes = {} # peer_id -> strike countFailure Modes and Resilience
Proactively discuss failures
Let me walk through failure scenarios. P2P systems are inherently fault-tolerant but have unique failure modes.
| Failure | Impact | Mitigation | Why It Works |
|---|---|---|---|
| Tracker down | No new peer discovery via tracker | DHT + PEX continue working | Multiple discovery methods |
| All seeders leave | Cannot complete download | Super-seeding, seed incentives | At least one copy must exist |
NAT Traversal Strategies:
class NATTraversal:
"""
Multiple strategies for connecting through NAT.Handling the Last Piece Problem:
End game mode
When download is almost complete, request remaining pieces from ALL peers that have them. First response wins, cancel duplicate requests. This prevents stalling on the last few pieces.
class DownloadManager:
def __init__(self):
self.end_game_mode = FalseEvolution and Scaling
What to say
This design handles millions of peers well. Let me discuss how it evolved historically and modern enhancements for specific use cases.
Evolution of P2P File Sharing:
Generation 1: Centralized Index (Napster) - Central server tracked who had what - Single point of failure - shut down by lawsuit
Generation 2: Tracker-based (Original BitTorrent) - Tracker only stores peer lists, not content - More resilient, but tracker still required
Generation 3: DHT (Mainline DHT) - Fully decentralized peer discovery - Works even without any tracker - Millions of nodes in global DHT
Generation 4: Content-addressed (IPFS) - Files identified by content hash - Automatic deduplication - Permanent web concept
Modern P2P Architecture - WebRTC Based
Modern Enhancements:
| Enhancement | Purpose | Example |
|---|---|---|
| WebRTC P2P | P2P in browsers | WebTorrent, PeerJS |
| IPFS/Filecoin | Content-addressed storage | Decentralized web hosting |
| BitTorrent v2 | SHA-256, Merkle trees | Better integrity, per-file hashes |
| WebSeeds | HTTP servers as fallback seeders | CDN integration |
| Protocol encryption | Avoid ISP throttling | Message Stream Encryption |
Alternative design
For private/enterprise P2P (e.g., software distribution), add identity layer with signed peer certificates. This enables accountability while keeping bandwidth benefits of P2P.
Scaling Considerations:
For Tracker: - Single tracker handles 100K+ announces/minute easily - Horizontal scale by sharding on info_hash - Use UDP for lower overhead (BEP 15)
For DHT: - Already distributed - scales automatically - Each node only stores fraction of data - Lookup is O(log n) hops
For Individual Swarms: - Limit connections per peer (typically 50-200) - Use super-seeding for initial distribution - Consider geographic clustering for latency
class GeoPeerSelector:
"""
Prefer peers that are geographically close.