`

Bigtable: A Distributed Storage System for Structured Data

阅读更多

OSDI '06 Paper

Pp. <!-- CHANGE -->205–218 of the Proceedings

<!-- START OF PAGE CONTENTS -->

 

Bigtable: A Distributed Storage System for Structured Data

Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach
Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber

{fay,jeff,sanjay,wilsonh,kerr,m3b,tushar,fikes,gruber}@google.com

Google, Inc.

Abstract:

Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. In this paper we describe the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable.

1 Introduction

Over the last two and a half years we have designed, implemented, and deployed a distributed storage system for managing structured data at Google called Bigtable. Bigtable is designed to reliably scale to petabytes of data and thousands of machines. Bigtable has achieved several goals: wide applicability, scalability, high performance, and high availability. Bigtable is used by more than sixty Google products and projects, including Google Analytics, Google Finance, Orkut, Personalized Search, Writely, and Google Earth. These products use Bigtable for a variety of demanding workloads, which range from throughput-oriented batch-processing jobs to latency-sensitive serving of data to end users. The Bigtable clusters used by these products span a wide range of configurations, from a handful to thousands of servers, and store up to several hundred terabytes of data.

In many ways, Bigtable resembles a database: it shares many implementation strategies with databases. Parallel databases [14] and main-memory databases [13] have achieved scalability and high performance, but Bigtable provides a different interface than such systems. Bigtable does not support a full relational data model; instead, it provides clients with a simple data model that supports dynamic control over data layout and format, and allows clients to reason about the locality properties of the data represented in the underlying storage. Data is indexed using row and column names that can be arbitrary strings. Bigtable also treats data as uninterpreted strings, although clients often serialize various forms of structured and semi-structured data into these strings. Clients can control the locality of their data through careful choices in their schemas. Finally, Bigtable schema parameters let clients dynamically control whether to serve data out of memory or from disk.

Section 2 describes the data model in more detail, and Section 3 provides an overview of the client API. Section 4 briefly describes the underlying Google infrastructure on which Bigtable depends. Section 5 describes the fundamentals of the Bigtable implementation, and Section 6 describes some of the refinements that we made to improve Bigtable's performance. Section 7 provides measurements of Bigtable's performance. We describe several examples of how Bigtable is used at Google in Section 8, and discuss some lessons we learned in designing and supporting Bigtable in Section 9. Finally, Section 10 describes related work, and Section 11 presents our conclusions.

 

Source Url : http://static.usenix.org/event/osdi06/tech/chang/chang_html/

 

分享到:
评论

相关推荐

    Bigtable: A Distributed Storage System for Structured Data中文翻译

    ### Bigtable:结构化数据的分布式存储系统 #### 引言与重要性 Bigtable,中文译为“大表”,是Google开发的一款用于大规模结构化数据的分布式存储系统。其设计初衷是为了应对互联网时代海量数据的存储与管理需求...

    BigTable A Distributed Storage System for Structured Data

    《BigTable:分布式结构化数据存储系统》 BigTable是由Google开发的一个分布式存储系统,专门用于管理和存储结构化数据,其设计目标是能够扩展到极大规模,处理PB级别的数据,并且能在数千台普通的服务器上运行。...

    谷歌三大论文中文版 mapReduce,fileSystem,BIgtable.zip

    谷歌的三大论文——《MapReduce: Simplified Data Processing on Large Clusters》、《The Google File System》和《Bigtable: A Distributed Storage System for Structured Data》是大数据处理领域的重要里程碑,...

    Google三大篇著名论文Paper 机器学习AI必读 GFS MapReduce BigTable

    最后,"Bigtable: A Distributed Storage System for Structured Data"展示了Google是如何设计一个用于存储结构化数据的分布式表格系统。Bigtable是一种NoSQL数据库,它能够处理PB级别的数据,并为各种Google服务如...

    云计算 三篇经典论文

    Simplified Data Processing on Large Clusters”(MapReduce:大型集群上的简化数据处理)、“The Google File System”(Google文件系统)和“Bigtable: A Distributed Storage System for Structured Data”...

    hbase-0.98.9-src.tar

    store modeled after Google' Bigtable: A Distributed Storage System for Structured Data by Chang et al.[2] Just as Bigtable leverages the distributed data storage provided by the Google File System, ...

    google 新旧三驾马车MapReduce/Bigtable/GFS和Caffeine/Dremel/Pregel

    [1]Bigtable: A Distributed Storage System for Structured Data [2]MapReduce: Simplified Data Processing on Large Clusters [3]The Google File System [4]Large-scale Incremental Processing Using ...

    Google三大论文英文原版

    《The Google File System》 《MapReduce: Simplified Data Processing on Large Clusters》 《Bigtable: A Distributed Storage System for Structured Data》

    Bigtable_A_Distributed_Storage_System_for_Structured_Data

    - Bigtable的数据存储主要依赖于Google的另一个分布式文件系统——GFS(Google File System)。 - 数据以SSTable的形式存储在磁盘上,每个SSTable对应一个列族的一部分数据。 - 为了提高读取效率,Bigtable还使用了...

    Google三大论文英文原版+中文版

    Hadoop基础,google三大论文,hadoop学习必备 《The Google File System 》 2003年 《MapReduce: Simplified Data Processing on Large ...《Bigtable: A Distributed Storage System for Structured Data》 2006年

    01-Google三大论文阅读.rar

    3. **Bigtable: A Distributed Storage System for Structured Data**(《Bigtable:用于结构化数据的分布式存储系统》): Google在2006年发布的Bigtable论文揭示了一个高度可扩展的分布式数据库系统。Bigtable...

Global site tag (gtag.js) - Google Analytics