System Design Fundamentals
11 items
11 items
Understanding the nines - from 99% to 99.999% and everything in between
Availability is the percentage of time a system is operational and serving requests correctly. Reliability is the probability that a system will perform its intended function without failure. These related but distinct concepts are measured in "nines" (99.9%, 99.99%), and each additional nine is exponentially harder to achieve. Understanding SLAs, SLOs, error budgets, failure modes, and redundancy patterns is essential for designing systems that meet business requirements without over-engineering.
Going from 99% to 99.9% isn't 0.9% harder—it's an order of magnitude harder. 99% allows 3.65 days downtime/year; 99.9% allows only 8.7 hours. Each additional nine requires fundamentally different architecture and operational practices.
If you depend on three services each with 99.9% availability, your combined availability is 0.999³ = 99.7%, not 99.9%. Long dependency chains dramatically reduce overall availability. This is why microservices often have worse availability than monoliths.
Reducing Mean Time To Recovery from 1 hour to 10 minutes improves availability more than doubling Mean Time Between Failures. Focus on fast detection, diagnosis, and recovery rather than preventing all failures.
Availability measures the proportion of time a system is operational and accessible.
Availability = Uptime / (Uptime + Downtime)
= MTBF / (MTBF + MTTR)Where: - MTBF: Mean Time Between Failures - MTTR: Mean Time To Recovery
Availability vs. Reliability: - Availability: Is the system up right now? - Reliability: Will the system keep working correctly over time?
A system can be highly available (rarely down) but unreliable (frequently returns wrong results). Or reliable (always correct when up) but not highly available (frequent outages).
Example: A database with 99.99% availability but occasional data corruption is available but unreliable. A database that's down for maintenance weekly but never corrupts data is reliable but not highly available.