Whitepapers
15 items
15 items
How Google reimagined distributed storage by embracing component failures and optimizing for large sequential workloads
GFS is a scalable distributed file system for large data-intensive applications. It runs on thousands of commodity machines, providing fault tolerance through replication while delivering high aggregate throughput. GFS challenged conventional file system assumptions: it embraced component failures as normal, optimized for large streaming reads and appends rather than random access, and relaxed consistency guarantees to achieve better performance. GFS became the storage foundation for MapReduce and Bigtable, enabling Google's web-scale data processing.
With thousands of commodity machines, hardware failures happen constantly—disks fail, machines crash, networks partition. GFS treats failure as expected behavior, not exceptional. Automatic recovery and replication are built into the core design.
GFS optimizes for multi-GB files accessed sequentially, not millions of small files with random access. This matches Google's workloads: web crawls, logs, MapReduce intermediates. Block size is 64MB (vs 4KB typical), reducing metadata overhead dramatically.
One master manages all metadata—namespace, file-to-chunk mapping, chunk locations. This simplifies design and enables sophisticated decisions (placement, rebalancing, garbage collection). The master is not a bottleneck because clients cache metadata and access data directly from chunkservers.
By 2003, Google's storage needs were unprecedented:
Existing distributed file systems (AFS, NFS) were designed for different workloads—small files, random access, few clients. Google needed a new approach.
Key observations that shaped GFS design:
These observations led to design choices that would seem wrong for general-purpose file systems but were exactly right for Google.