SystemExpertsSystemExperts
Pricing

Patterns

35 items

Horizontal Scaling Pattern

15mbeginner

Retry with Backoff Pattern

15mbeginner

Queue-based Load Leveling Pattern

20mintermediate

Replication Pattern

25mintermediate

Caching Strategies Pattern

25mintermediate

Fan-out Pattern

20mintermediate

Fan-in Pattern

20mintermediate

Persistent Connections Pattern

20mintermediate

Load Balancing Pattern

20mintermediate

Circuit Breaker Pattern

20mintermediate

Bloom Filters Pattern

20mintermediate

Time-Series Storage Pattern

20mintermediate

Bulkhead Pattern

20mintermediate

Batch Processing Pattern

20mintermediate

Write-Ahead Log Pattern

20mintermediate

API Gateway Pattern

20mintermediate

Backend for Frontend Pattern

20mintermediate

Sidecar Pattern

20mintermediate

Idempotency Pattern

20mintermediate

Rate Limiting Pattern

20mintermediate

Backpressure Pattern

20mintermediate

Pub/Sub Pattern

25mintermediate

Eventual Consistency Pattern

25mintermediate

Sharding Pattern

25madvanced

Conflict Resolution Pattern

25madvanced

Strong Consistency Pattern

30madvanced

Leader Election Pattern

25madvanced

Consensus Protocols Pattern

30madvanced

Stream Processing Pattern

25madvanced

Change Data Capture Pattern

25madvanced

Distributed Locking Pattern

25madvanced

Two-Phase Commit Pattern

25madvanced

LSM Trees Pattern

25madvanced

Event Sourcing Pattern

30madvanced

CQRS Pattern

28madvanced
System Design Pattern
Rate Managementrate-limitingtoken-bucketsliding-windowthrottlingapi-protectionintermediate

Rate Limiting Pattern

Controlling request rates to protect resources

Used in: API Gateway, Redis, Load Balancer|20 min read

Summary

Rate limiting controls how many requests a client can make in a given time window, protecting systems from abuse, ensuring fair usage, and preventing cascading failures. Common algorithms include Token Bucket (allows bursts), Leaky Bucket (smooth rate), Fixed Window (simple), and Sliding Window (accurate). Rate limits can be applied per user, API key, IP address, or globally. Every public API uses rate limiting - Twitter limits 300 tweets/3hrs, GitHub limits 5000 requests/hour. Essential for both protecting your system and providing fair access.

Key Takeaways

Multiple Dimensions

Rate limit by user, API key, IP, endpoint, or combinations. Different limits for different tiers (free vs paid). Protect both per-user and globally.

Token Bucket for Bursts

Bucket fills with tokens at steady rate. Each request consumes token. Allows short bursts up to bucket size. Good for API rate limiting.

Sliding Window for Accuracy

Fixed windows have boundary issues (double rate at window boundary). Sliding window smooths this but requires more state.

Token Bucket: - Bucket holds N tokens (burst capacity) - Tokens added at rate R per second - Request takes 1 token - No token = rejected - Allows bursts up to N

Leaky Bucket: - Requests enter bucket - Processed at fixed rate - Overflow rejected - Smooth output rate

Fixed Window: - Count requests in fixed intervals (per minute) - Reset count at interval boundary - Simple but has boundary spike issue

Sliding Window: - Weighted count across windows - Smoother than fixed window - More accurate, more state

Summary

Rate limiting controls how many requests a client can make in a given time window, protecting systems from abuse, ensuring fair usage, and preventing cascading failures. Common algorithms include Token Bucket (allows bursts), Leaky Bucket (smooth rate), Fixed Window (simple), and Sliding Window (accurate). Rate limits can be applied per user, API key, IP address, or globally. Every public API uses rate limiting - Twitter limits 300 tweets/3hrs, GitHub limits 5000 requests/hour. Essential for both protecting your system and providing fair access.

Key Takeaways

Multiple Dimensions

Rate limit by user, API key, IP, endpoint, or combinations. Different limits for different tiers (free vs paid). Protect both per-user and globally.

Token Bucket for Bursts

Bucket fills with tokens at steady rate. Each request consumes token. Allows short bursts up to bucket size. Good for API rate limiting.

Sliding Window for Accuracy

Fixed windows have boundary issues (double rate at window boundary). Sliding window smooths this but requires more state.

Response Headers Convention

Return X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset. Clients can adjust behavior. Standard practice.

Distributed Rate Limiting

Multiple servers need shared state (Redis). Local-only limits can be bypassed. Synchronization adds latency.

Graceful Degradation

When limit exceeded, return 429 Too Many Requests. Include Retry-After header. Don't just drop requests silently.

Pattern Details

Token Bucket: - Bucket holds N tokens (burst capacity) - Tokens added at rate R per second - Request takes 1 token - No token = rejected - Allows bursts up to N

Leaky Bucket: - Requests enter bucket - Processed at fixed rate - Overflow rejected - Smooth output rate

Fixed Window: - Count requests in fixed intervals (per minute) - Reset count at interval boundary - Simple but has boundary spike issue

Sliding Window: - Weighted count across windows - Smoother than fixed window - More accurate, more state

Trade-offs

AspectAdvantageDisadvantage

Premium Content

Sign in to access this content or upgrade for full access.