SystemExpertsSystemExperts
Pricing

System Design Fundamentals

11 items

Scalability Fundamentals

25mbeginner

Latency, Throughput & Performance

30mbeginner

Back-of-Envelope Calculations

25mbeginner

Availability & Reliability Fundamentals

35mintermediate

CAP Theorem & Consistency Models

40mintermediate

Load Balancing Deep Dive

35mintermediate

Asynchronous Processing & Message Queues

30mintermediate

Networking & Protocols

30mintermediate

Caching Strategies

35mintermediate

System Design Fundamentals

20mintermediate

Database Fundamentals

40madvanced
Fundamentalsscalabilityhorizontal-scalingvertical-scalingdistributed-systemsfundamentalssystem-designbeginner

Scalability Fundamentals

The foundation of system design - understanding how systems grow from 1 to 1 million users

Foundation knowledge|25 min read

Summary

Scalability is a system's ability to handle increased load by adding resources. There are two approaches: vertical scaling (bigger machines) and horizontal scaling (more machines). The key insight is that scalability is NOT the same as performance—a system can be fast but not scalable, or scalable but slow. Understanding this distinction, along with stateless design principles, is the foundation for all system design decisions.

Key Takeaways

Scalability ≠ Performance

Performance is how fast a single request completes. Scalability is how well the system handles increasing load. A single-threaded service might respond in 1ms (great performance) but fall over at 1000 QPS (poor scalability). Always clarify which problem you're solving.

Horizontal Scales Further Than Vertical

Vertical scaling hits hard limits (biggest EC2 is 24TB RAM, 448 vCPUs). Horizontal scaling is theoretically unlimited. But horizontal introduces complexity: distributed state, network partitions, consensus. Choose based on your actual scale needs, not hypothetical future scale.

Stateless Services Scale Linearly

If a service holds no state between requests, you can add/remove instances freely. Load balancers distribute traffic evenly. This is why REST APIs scale better than WebSocket servers holding connection state.

Scalability is a system's ability to handle growing amounts of work by adding resources. But this definition hides crucial nuance.

Consider two systems: - System A: Handles 100 requests/second. Adding a second server doubles capacity to 200 req/s. - System B: Handles 100 requests/second. Adding a second server increases capacity to 120 req/s.

System A scales linearly—double the resources, double the capacity. System B has diminishing returns—shared state or coordination overhead limits scaling efficiency.

The goal isn't just to scale, but to scale efficiently. This is measured by scalability efficiency:

Efficiency = Actual Throughput / (Theoretical Throughput × Number of Nodes)

If 10 servers each capable of 1000 req/s together handle only 7000 req/s, efficiency is 70%. The missing 30% is scaling overhead.

Summary

Scalability is a system's ability to handle increased load by adding resources. There are two approaches: vertical scaling (bigger machines) and horizontal scaling (more machines). The key insight is that scalability is NOT the same as performance—a system can be fast but not scalable, or scalable but slow. Understanding this distinction, along with stateless design principles, is the foundation for all system design decisions.

Key Takeaways

Scalability ≠ Performance

Performance is how fast a single request completes. Scalability is how well the system handles increasing load. A single-threaded service might respond in 1ms (great performance) but fall over at 1000 QPS (poor scalability). Always clarify which problem you're solving.

Horizontal Scales Further Than Vertical

Vertical scaling hits hard limits (biggest EC2 is 24TB RAM, 448 vCPUs). Horizontal scaling is theoretically unlimited. But horizontal introduces complexity: distributed state, network partitions, consensus. Choose based on your actual scale needs, not hypothetical future scale.

Stateless Services Scale Linearly

If a service holds no state between requests, you can add/remove instances freely. Load balancers distribute traffic evenly. This is why REST APIs scale better than WebSocket servers holding connection state.

State Must Live Somewhere

Pushing state out of application servers doesn't eliminate it—it moves to databases, caches, or message queues. These become your scaling bottlenecks. The art is choosing the right stateful system for your access patterns.

Scaling is About Bottlenecks

Every system has a bottleneck: CPU, memory, disk I/O, network, or database connections. Scaling only helps if you scale the bottleneck. Adding app servers won't help if your database is saturated.

Scale When You Need To, Not Before

Premature optimization for scale adds complexity without benefit. A well-written monolith on a single server handles more traffic than most startups will ever see. Scale in response to actual bottlenecks, measured with real data.

Deep Dive

Scalability is a system's ability to handle growing amounts of work by adding resources. But this definition hides crucial nuance.

Consider two systems: - System A: Handles 100 requests/second. Adding a second server doubles capacity to 200 req/s. - System B: Handles 100 requests/second. Adding a second server increases capacity to 120 req/s.

System A scales linearly—double the resources, double the capacity. System B has diminishing returns—shared state or coordination overhead limits scaling efficiency.

The goal isn't just to scale, but to scale efficiently. This is measured by scalability efficiency:

Efficiency = Actual Throughput / (Theoretical Throughput × Number of Nodes)

If 10 servers each capable of 1000 req/s together handle only 7000 req/s, efficiency is 70%. The missing 30% is scaling overhead.

Trade-offs

AspectAdvantageDisadvantage
Vertical vs HorizontalVertical is simpler, no distributed systems complexity, no network partitions to handleVertical has hard limits (biggest machine exists), horizontal scales indefinitely but adds significant complexity
Stateless ServicesScale linearly by adding servers, any server can handle any request, simple load balancingState must live somewhere (database/cache becomes bottleneck), requires external session management
Early ScalingPrepared for growth, architecture supports future scale, no emergency rewritesPremature complexity, slower development, higher costs, solving problems you may never have
Distributed DatabaseNo single point of failure, theoretically unlimited scale, geographic distribution possibleConsistency challenges, operational complexity, more failure modes, harder debugging

Premium Content

Sign in to access this content or upgrade for full access.