System Design Masterclass
Storagep2pbittorrentdistributed-systemsfile-sharingdhtadvanced

Design P2P File Transfer System

Design a peer-to-peer file sharing system like BitTorrent

Millions of concurrent peers, petabytes transferred|Similar to BitTorrent Inc, Protocol Labs, Spotify, Blizzard, Microsoft|45 min read

Summary

A P2P file transfer system enables millions of peers to share files without centralized servers bearing the bandwidth cost. The core challenge is coordinating distributed peers, ensuring file integrity through chunked verification, incentivizing fair sharing through tit-for-tat algorithms, and discovering peers efficiently. This design is asked at companies building CDNs, video streaming, software distribution, and decentralized applications.

Key Takeaways

Core Problem

This is fundamentally a distributed coordination problem where untrusted peers must cooperate to transfer data efficiently without central authority.

The Hard Part

Incentivizing peers to upload (not just download) while handling malicious peers, NAT traversal, and ensuring data integrity without trusting anyone.

Scaling Axis

Scale by adding more peers - each new peer adds both download demand AND upload capacity. The system gets faster with more participants.

Critical Invariant

Every piece of the file must be verified against its hash before being accepted. Never trust data from peers without cryptographic verification.

Performance Insight

Rarest-first piece selection maximizes swarm health. Download rare pieces first to become a useful uploader quickly.

Key Tradeoff

Centralized tracker is simple but creates single point of failure. DHT is resilient but adds complexity and latency for peer discovery.

Design Walkthrough

Problem Statement

The Question: Design a peer-to-peer file sharing system that can distribute large files to millions of users without relying on expensive centralized infrastructure.

Why P2P matters: - Cost efficiency: Bandwidth cost shared across all peers, not borne by single server - Scalability: More downloaders = more uploaders = faster downloads for everyone - Resilience: No single point of failure, works even if original seeder goes offline - Geographic distribution: Peers download from nearby peers, reducing latency

What to say first

Before I design, let me clarify: are we optimizing for maximum throughput, minimal infrastructure cost, or resilience? Also, what is the typical file size and how many concurrent peers do we expect?

Real-world applications: - Software distribution: Windows updates, game patches (Blizzard uses P2P) - Video streaming: Some live streaming platforms use P2P for edge delivery - Content delivery: Spotify used P2P for music caching - Decentralized storage: IPFS, Filecoin build on these concepts

Scale context

BitTorrent at peak handled 150+ million monthly active users, transferring petabytes of data daily. A single popular torrent can have millions of peers in its swarm.

Clarifying Questions

Ask these questions to demonstrate understanding of P2P system tradeoffs.

Question 1: Trust Model

Can we trust any of the peers? Or must we assume all peers are potentially malicious?

Why this matters: Determines verification requirements. Typical answer: Assume zero trust - any peer could send corrupted data Architecture impact: Must verify every piece with cryptographic hashes

Question 2: Peer Discovery

Is a centralized tracker acceptable, or must the system be fully decentralized?

Why this matters: Affects complexity and resilience. Typical answer: Start with tracker, evolve to DHT for resilience Architecture impact: Tracker is simpler; DHT handles tracker failures

Question 3: NAT Handling

How many peers are behind NAT/firewalls? Do we need to support peer-to-peer connections between NATted hosts?

Why this matters: Most home users are behind NAT. Typical answer: 70%+ of peers are behind NAT Architecture impact: Need hole punching, relay servers, or accept reduced connectivity

Question 4: Incentive Model

How do we prevent free-riding (downloading without uploading)?

Why this matters: Without incentives, rational users only download. Typical answer: Use tit-for-tat - prioritize peers who upload to you Architecture impact: Need upload tracking and choking algorithm

Stating assumptions

I will assume: zero-trust peers, tracker plus DHT for discovery, most peers behind NAT, tit-for-tat incentives. Files are large (100MB to 10GB), millions of concurrent peers globally.

The Hard Part

Say this out loud

The hard part here is coordinating millions of untrusted peers to cooperate efficiently while handling NAT traversal, incentivizing uploads, and ensuring every byte is verified.

Why this is genuinely hard:

  1. 1.Byzantine Environment: Any peer could be malicious - sending wrong data, lying about what pieces they have, or trying to poison the swarm
  2. 2.NAT Traversal: Most peers cannot accept incoming connections. Two peers both behind NAT cannot directly connect without help
  3. 3.Free-rider Problem: Rational users want to download without uploading. Without proper incentives, the system collapses
  4. 4.Piece Selection: Which pieces to download first matters enormously for swarm health
  5. 5.Peer Discovery at Scale: Finding peers efficiently among millions without overloading any central system

Common mistake

Candidates often forget about NAT traversal or assume all peers can connect to each other. In reality, two peers both behind NAT cannot connect without relay or hole punching.

The elegant insight of BitTorrent:

The system turns the scalability problem on its head:

Traditional: More users = more server load = slower
P2P:         More users = more upload capacity = faster

Each downloader becomes an uploader. The most popular content becomes the fastest to download, not the slowest.

Scale and Access Patterns

Let me estimate the scale and understand access patterns.

DimensionValueImpact
Concurrent peers1-10 million per popular fileNeed efficient peer discovery
File size100MB to 50GB typicalMust chunk into pieces
+ 5 more rows...

What to say

The interesting property of P2P is inverse scaling - a file with 1 million downloaders has potentially 1 million uploaders. Total bandwidth scales with demand.

Popular file distribution:
- File size: 4 GB
- Piece size: 4 MB -> 1,000 pieces
+ 10 more lines...

Access Pattern Analysis:

  • Write once, read many: File is created once, downloaded millions of times - Chunked access: Peers request individual pieces, not whole file - Random access pattern: Rarest-first means random piece requests - Locality matters: Peers prefer nearby peers for lower latency - Time-varying: New releases spike, then taper off

High-Level Architecture

Let me walk through the core components of a P2P file sharing system.

What to say

The architecture has three main layers: content identification (what), peer discovery (who has it), and data transfer (getting it). I will start with a tracker-based design and show how DHT provides resilience.

P2P File Sharing Architecture

Component Responsibilities:

1. Torrent File / Magnet Link - Contains metadata: file name, size, piece hashes, tracker URLs - Info hash uniquely identifies the content (SHA-1 of metadata) - Magnet link is just the info hash - metadata fetched from peers

2. Tracker - Maintains list of peers for each torrent (by info hash) - Peers announce themselves periodically - Returns random subset of peers to requesting client - Simple HTTP or UDP protocol

3. DHT (Distributed Hash Table) - Decentralized peer discovery - Peers store and lookup info_hash -> peer_list mappings - No single point of failure - Uses Kademlia protocol in BitTorrent

4. Peer Exchange (PEX) - Connected peers share their peer lists with each other - Reduces load on tracker/DHT - Helps discover peers organically

Real-world reference

BitTorrent uses all three discovery methods simultaneously. Tracker for initial peers, DHT for resilience, PEX for efficiency. Modern clients work even if tracker is down.

Data Model and Storage

The data model centers around content addressing and piece verification.

What to say

The torrent file is the source of truth for content integrity. It contains hashes of every piece, allowing verification without trusting any peer.

torrent_metadata = {
    "info": {
        "name": "ubuntu-22.04.iso",
+ 20 more lines...

Piece Hashing for Integrity:

import hashlib

class PieceManager:
+ 29 more lines...

Tracker Database Schema:

-- Torrents being tracked
CREATE TABLE torrents (
    info_hash BYTEA PRIMARY KEY,  -- 20-byte SHA-1
+ 29 more lines...

Important detail

Tracker only stores peer locations, not file content. Even if tracker is compromised, file integrity is guaranteed by piece hashes in the torrent file.

Algorithm Deep Dive

Three algorithms are critical to P2P performance: piece selection, peer selection (choking), and DHT routing.

1. Piece Selection: Rarest First

Which piece should a peer download next? This choice dramatically affects swarm health.

class PieceSelector:
    def __init__(self, num_pieces: int):
        self.num_pieces = num_pieces
+ 38 more lines...

Why rarest first?

If everyone downloads common pieces first, rare pieces become bottlenecks. Rarest-first ensures every piece has multiple sources, maximizing swarm resilience and speed.

2. Choking Algorithm: Tit-for-Tat

How do we decide which peers to upload to? This is the incentive mechanism.

class ChokingManager:
    """
    Implements BitTorrent choking algorithm:
+ 53 more lines...

Why optimistic unchoke?

New peers have no upload history yet. Optimistic unchoke gives them a chance to prove themselves. Without it, new peers could never bootstrap into the swarm.

3. DHT (Kademlia) for Decentralized Discovery

Kademlia DHT Routing

class KademliaDHT:
    """
    Simplified Kademlia DHT implementation.
+ 60 more lines...

Consistency and Invariants

System Invariants

Every piece MUST be verified against its SHA-1 hash before acceptance. Never write unverified data to disk or share it with other peers.

Trust Model - Zero Trust:

P2P systems operate in a Byzantine environment. Any peer could: - Send corrupted data (intentionally or due to bugs) - Lie about which pieces they have - Try to waste your bandwidth - Attempt to poison the swarm with bad data

ThreatMitigationWhy It Works
Corrupted pieceSHA-1 hash verificationCryptographically secure - cannot fake
Fake bitfieldVerify on request failureDisconnect peers who lie about having pieces
Bandwidth wasteRequest pipelining limitsCap outstanding requests per peer
Sybil attackProof of work or reputationExpensive to create many fake identities
Eclipse attackMultiple discovery methodsTracker + DHT + PEX makes isolation hard

Important detail

SHA-1 has known weaknesses for collision attacks, but remains secure for piece verification where attacker cannot control the original content. Modern protocols use SHA-256.

Consistency Guarantees:

  • File integrity: Guaranteed by piece hashes - mathematically certain - Piece availability: Eventually consistent - takes time for pieces to spread - Peer lists: Weakly consistent - stale data is common and acceptable - Upload/download stats: Best effort - used for incentives, not billing
class PeerManager:
    def __init__(self):
        self.peer_strikes = {}  # peer_id -> strike count
+ 21 more lines...

Failure Modes and Resilience

Proactively discuss failures

Let me walk through failure scenarios. P2P systems are inherently fault-tolerant but have unique failure modes.

FailureImpactMitigationWhy It Works
Tracker downNo new peer discovery via trackerDHT + PEX continue workingMultiple discovery methods
All seeders leaveCannot complete downloadSuper-seeding, seed incentivesAt least one copy must exist
+ 4 more rows...

NAT Traversal Strategies:

class NATTraversal:
    """
    Multiple strategies for connecting through NAT.
+ 50 more lines...

Handling the Last Piece Problem:

End game mode

When download is almost complete, request remaining pieces from ALL peers that have them. First response wins, cancel duplicate requests. This prevents stalling on the last few pieces.

class DownloadManager:
    def __init__(self):
        self.end_game_mode = False
+ 32 more lines...

Evolution and Scaling

What to say

This design handles millions of peers well. Let me discuss how it evolved historically and modern enhancements for specific use cases.

Evolution of P2P File Sharing:

Generation 1: Centralized Index (Napster) - Central server tracked who had what - Single point of failure - shut down by lawsuit

Generation 2: Tracker-based (Original BitTorrent) - Tracker only stores peer lists, not content - More resilient, but tracker still required

Generation 3: DHT (Mainline DHT) - Fully decentralized peer discovery - Works even without any tracker - Millions of nodes in global DHT

Generation 4: Content-addressed (IPFS) - Files identified by content hash - Automatic deduplication - Permanent web concept

Modern P2P Architecture - WebRTC Based

Modern Enhancements:

EnhancementPurposeExample
WebRTC P2PP2P in browsersWebTorrent, PeerJS
IPFS/FilecoinContent-addressed storageDecentralized web hosting
BitTorrent v2SHA-256, Merkle treesBetter integrity, per-file hashes
WebSeedsHTTP servers as fallback seedersCDN integration
Protocol encryptionAvoid ISP throttlingMessage Stream Encryption

Alternative design

For private/enterprise P2P (e.g., software distribution), add identity layer with signed peer certificates. This enables accountability while keeping bandwidth benefits of P2P.

Scaling Considerations:

For Tracker: - Single tracker handles 100K+ announces/minute easily - Horizontal scale by sharding on info_hash - Use UDP for lower overhead (BEP 15)

For DHT: - Already distributed - scales automatically - Each node only stores fraction of data - Lookup is O(log n) hops

For Individual Swarms: - Limit connections per peer (typically 50-200) - Use super-seeding for initial distribution - Consider geographic clustering for latency

class GeoPeerSelector:
    """
    Prefer peers that are geographically close.
+ 33 more lines...

Design Trade-offs

Advantages

  • +Simple implementation
  • +Fast peer discovery
  • +Easy to monitor

Disadvantages

  • -Single point of failure
  • -Legal target
  • -Scaling bottleneck
When to use

Private trackers, controlled environments, initial bootstrapping