P2P File Transfer System

Design Walkthrough

Problem Statement

The Question: Design a peer-to-peer file sharing system that can distribute large files to millions of users without relying on expensive centralized infrastructure.

Why P2P matters: - Cost efficiency: Bandwidth cost shared across all peers, not borne by single server - Scalability: More downloaders = more uploaders = faster downloads for everyone - Resilience: No single point of failure, works even if original seeder goes offline - Geographic distribution: Peers download from nearby peers, reducing latency

What to say first

Before I design, let me clarify: are we optimizing for maximum throughput, minimal infrastructure cost, or resilience? Also, what is the typical file size and how many concurrent peers do we expect?

Real-world applications: - Software distribution: Windows updates, game patches (Blizzard uses P2P) - Video streaming: Some live streaming platforms use P2P for edge delivery - Content delivery: Spotify used P2P for music caching - Decentralized storage: IPFS, Filecoin build on these concepts

Scale context

BitTorrent at peak handled 150+ million monthly active users, transferring petabytes of data daily. A single popular torrent can have millions of peers in its swarm.

Clarifying Questions

Ask these questions to demonstrate understanding of P2P system tradeoffs.

Question 1: Trust Model

Can we trust any of the peers? Or must we assume all peers are potentially malicious?

Why this matters: Determines verification requirements. Typical answer: Assume zero trust - any peer could send corrupted data Architecture impact: Must verify every piece with cryptographic hashes

Question 2: Peer Discovery

Is a centralized tracker acceptable, or must the system be fully decentralized?

Why this matters: Affects complexity and resilience. Typical answer: Start with tracker, evolve to DHT for resilience Architecture impact: Tracker is simpler; DHT handles tracker failures

Question 3: NAT Handling

How many peers are behind NAT/firewalls? Do we need to support peer-to-peer connections between NATted hosts?

Why this matters: Most home users are behind NAT. Typical answer: 70%+ of peers are behind NAT Architecture impact: Need hole punching, relay servers, or accept reduced connectivity

Question 4: Incentive Model

How do we prevent free-riding (downloading without uploading)?

Why this matters: Without incentives, rational users only download. Typical answer: Use tit-for-tat - prioritize peers who upload to you Architecture impact: Need upload tracking and choking algorithm

Stating assumptions

I will assume: zero-trust peers, tracker plus DHT for discovery, most peers behind NAT, tit-for-tat incentives. Files are large (100MB to 10GB), millions of concurrent peers globally.

The Hard Part

Say this out loud

The hard part here is coordinating millions of untrusted peers to cooperate efficiently while handling NAT traversal, incentivizing uploads, and ensuring every byte is verified.

Why this is genuinely hard:

1.Byzantine Environment: Any peer could be malicious - sending wrong data, lying about what pieces they have, or trying to poison the swarm
2.NAT Traversal: Most peers cannot accept incoming connections. Two peers both behind NAT cannot directly connect without help
3.Free-rider Problem: Rational users want to download without uploading. Without proper incentives, the system collapses
4.Piece Selection: Which pieces to download first matters enormously for swarm health
5.Peer Discovery at Scale: Finding peers efficiently among millions without overloading any central system

Common mistake

Candidates often forget about NAT traversal or assume all peers can connect to each other. In reality, two peers both behind NAT cannot connect without relay or hole punching.

The elegant insight of BitTorrent:

The system turns the scalability problem on its head:

Traditional: More users = more server load = slower
P2P:         More users = more upload capacity = faster

Each downloader becomes an uploader. The most popular content becomes the fastest to download, not the slowest.

Scale and Access Patterns

Let me estimate the scale and understand access patterns.

Dimension	Value	Impact
Concurrent peers	1-10 million per popular file	Need efficient peer discovery
File size	100MB to 50GB typical	Must chunk into pieces

+ 5 more rows...

What to say

The interesting property of P2P is inverse scaling - a file with 1 million downloaders has potentially 1 million uploaders. Total bandwidth scales with demand.

Popular file distribution:
- File size: 4 GB
- Piece size: 4 MB -> 1,000 pieces

+ 10 more lines...

Access Pattern Analysis:

Write once, read many: File is created once, downloaded millions of times - Chunked access: Peers request individual pieces, not whole file - Random access pattern: Rarest-first means random piece requests - Locality matters: Peers prefer nearby peers for lower latency - Time-varying: New releases spike, then taper off

High-Level Architecture

Let me walk through the core components of a P2P file sharing system.

What to say

The architecture has three main layers: content identification (what), peer discovery (who has it), and data transfer (getting it). I will start with a tracker-based design and show how DHT provides resilience.

P2P File Sharing Architecture

Component Responsibilities:

1. Torrent File / Magnet Link - Contains metadata: file name, size, piece hashes, tracker URLs - Info hash uniquely identifies the content (SHA-1 of metadata) - Magnet link is just the info hash - metadata fetched from peers

2. Tracker - Maintains list of peers for each torrent (by info hash) - Peers announce themselves periodically - Returns random subset of peers to requesting client - Simple HTTP or UDP protocol

3. DHT (Distributed Hash Table) - Decentralized peer discovery - Peers store and lookup info_hash -> peer_list mappings - No single point of failure - Uses Kademlia protocol in BitTorrent

4. Peer Exchange (PEX) - Connected peers share their peer lists with each other - Reduces load on tracker/DHT - Helps discover peers organically

Real-world reference

BitTorrent uses all three discovery methods simultaneously. Tracker for initial peers, DHT for resilience, PEX for efficiency. Modern clients work even if tracker is down.

Data Model and Storage

The data model centers around content addressing and piece verification.

What to say

The torrent file is the source of truth for content integrity. It contains hashes of every piece, allowing verification without trusting any peer.

torrent_metadata = {
    "info": {
        "name": "ubuntu-22.04.iso",

+ 20 more lines...

Piece Hashing for Integrity:

import hashlib

class PieceManager:

+ 29 more lines...

Tracker Database Schema:

-- Torrents being tracked
CREATE TABLE torrents (
    info_hash BYTEA PRIMARY KEY,  -- 20-byte SHA-1

+ 29 more lines...

Important detail

Tracker only stores peer locations, not file content. Even if tracker is compromised, file integrity is guaranteed by piece hashes in the torrent file.

Algorithm Deep Dive

Three algorithms are critical to P2P performance: piece selection, peer selection (choking), and DHT routing.

1. Piece Selection: Rarest First

Which piece should a peer download next? This choice dramatically affects swarm health.

class PieceSelector:
    def __init__(self, num_pieces: int):
        self.num_pieces = num_pieces

+ 38 more lines...

Why rarest first?

If everyone downloads common pieces first, rare pieces become bottlenecks. Rarest-first ensures every piece has multiple sources, maximizing swarm resilience and speed.

2. Choking Algorithm: Tit-for-Tat

How do we decide which peers to upload to? This is the incentive mechanism.

class ChokingManager:
    """
    Implements BitTorrent choking algorithm:

+ 53 more lines...

Why optimistic unchoke?

New peers have no upload history yet. Optimistic unchoke gives them a chance to prove themselves. Without it, new peers could never bootstrap into the swarm.

3. DHT (Kademlia) for Decentralized Discovery

Kademlia DHT Routing

class KademliaDHT:
    """
    Simplified Kademlia DHT implementation.

+ 60 more lines...

Consistency and Invariants

System Invariants

Every piece MUST be verified against its SHA-1 hash before acceptance. Never write unverified data to disk or share it with other peers.

Trust Model - Zero Trust:

P2P systems operate in a Byzantine environment. Any peer could: - Send corrupted data (intentionally or due to bugs) - Lie about which pieces they have - Try to waste your bandwidth - Attempt to poison the swarm with bad data

Threat	Mitigation	Why It Works
Corrupted piece	SHA-1 hash verification	Cryptographically secure - cannot fake
Fake bitfield	Verify on request failure	Disconnect peers who lie about having pieces
Bandwidth waste	Request pipelining limits	Cap outstanding requests per peer
Sybil attack	Proof of work or reputation	Expensive to create many fake identities
Eclipse attack	Multiple discovery methods	Tracker + DHT + PEX makes isolation hard

Important detail

SHA-1 has known weaknesses for collision attacks, but remains secure for piece verification where attacker cannot control the original content. Modern protocols use SHA-256.

Consistency Guarantees:

File integrity: Guaranteed by piece hashes - mathematically certain - Piece availability: Eventually consistent - takes time for pieces to spread - Peer lists: Weakly consistent - stale data is common and acceptable - Upload/download stats: Best effort - used for incentives, not billing

class PeerManager:
    def __init__(self):
        self.peer_strikes = {}  # peer_id -> strike count

+ 21 more lines...

Failure Modes and Resilience

Proactively discuss failures

Let me walk through failure scenarios. P2P systems are inherently fault-tolerant but have unique failure modes.

Failure	Impact	Mitigation	Why It Works
Tracker down	No new peer discovery via tracker	DHT + PEX continue working	Multiple discovery methods
All seeders leave	Cannot complete download	Super-seeding, seed incentives	At least one copy must exist

+ 4 more rows...

NAT Traversal Strategies:

class NATTraversal:
    """
    Multiple strategies for connecting through NAT.

+ 50 more lines...

Handling the Last Piece Problem:

End game mode

When download is almost complete, request remaining pieces from ALL peers that have them. First response wins, cancel duplicate requests. This prevents stalling on the last few pieces.

class DownloadManager:
    def __init__(self):
        self.end_game_mode = False

+ 32 more lines...

Evolution and Scaling

What to say

This design handles millions of peers well. Let me discuss how it evolved historically and modern enhancements for specific use cases.

Evolution of P2P File Sharing:

Generation 1: Centralized Index (Napster) - Central server tracked who had what - Single point of failure - shut down by lawsuit

Generation 2: Tracker-based (Original BitTorrent) - Tracker only stores peer lists, not content - More resilient, but tracker still required

Generation 3: DHT (Mainline DHT) - Fully decentralized peer discovery - Works even without any tracker - Millions of nodes in global DHT

Generation 4: Content-addressed (IPFS) - Files identified by content hash - Automatic deduplication - Permanent web concept

Modern P2P Architecture - WebRTC Based

Modern Enhancements:

Enhancement	Purpose	Example
WebRTC P2P	P2P in browsers	WebTorrent, PeerJS
IPFS/Filecoin	Content-addressed storage	Decentralized web hosting
BitTorrent v2	SHA-256, Merkle trees	Better integrity, per-file hashes
WebSeeds	HTTP servers as fallback seeders	CDN integration
Protocol encryption	Avoid ISP throttling	Message Stream Encryption

Alternative design

For private/enterprise P2P (e.g., software distribution), add identity layer with signed peer certificates. This enables accountability while keeping bandwidth benefits of P2P.

Scaling Considerations:

For Tracker: - Single tracker handles 100K+ announces/minute easily - Horizontal scale by sharding on info_hash - Use UDP for lower overhead (BEP 15)

For DHT: - Already distributed - scales automatically - Each node only stores fraction of data - Lookup is O(log n) hops

For Individual Swarms: - Limit connections per peer (typically 50-200) - Use super-seeding for initial distribution - Consider geographic clustering for latency

class GeoPeerSelector:
    """
    Prefer peers that are geographically close.

+ 33 more lines...

System Design Masterclass

Weather Application with Forecasting

URL Shortener

Live Comments Feature

API Rate Limiter

On-Call Escalation System

Hotel Booking and Reservation System

Parts Compatibility Validation

Real-time Stock Price Viewer

Top-K Rankings System

File Download and Sync Library

Real-time Active Viewers

Marketplace Features

Price Alert System

Netflix Screen Concurrency Limits

Live Reactions System

Top K Most Shared Articles

High-Profile Likes Counter

Authentication and User Login

Google Calendar

Web Crawler

News Feed

Video Streaming Platform

IoC / Dependency Injection Framework

Distributed Control Infrastructure

Notification Service

Distributed Tracing System