Whitepapers
15 items
15 items
The sparse, distributed, persistent multi-dimensional sorted map that powers Google Search, Maps, YouTube, and Gmail
Bigtable is a distributed storage system for managing structured data designed to scale to petabytes across thousands of commodity servers. It provides a simple data model: a sparse, distributed, persistent multi-dimensional sorted map indexed by row key, column key, and timestamp. Bigtable doesn't support a full relational model, but gives clients dynamic control over data layout and format, allowing them to reason about locality. Built on GFS for storage and Chubby for coordination, Bigtable became the foundation for Google's most critical services and inspired an entire generation of NoSQL databases including HBase, Cassandra, and Cloud Bigtable.
Data is indexed by (row, column, timestamp) and stored in lexicographic row order. Rows are the unit of transactional consistency. This simple abstraction supports wildly different data—from web pages to satellite imagery—with the same API.
Tables are split into tablets (100-200 MB), each holding a contiguous range of rows. Tablets are the unit of distribution and load balancing. As tables grow, tablets split automatically. This enables horizontal scaling without application changes.
Columns are grouped into column families, which are the unit of access control and storage. Data in the same column family is stored together on disk. This allows applications to co-locate related data for efficient access patterns.
By 2006, Google needed to store and query structured data across dozens of services:
Each project faced similar challenges but had different data patterns.
Why not use a relational database?
Why not use raw files (GFS)?
Bigtable sits between these extremes: more structure than files, more flexibility than RDBMS.