OSDI '06 Paper
Pp. <!-- CHANGE -->205–218 of the Proceedings
<!-- START OF PAGE CONTENTS -->
Bigtable: A Distributed Storage System for Structured Data
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach
Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber
{fay,jeff,sanjay,wilsonh,kerr,m3b,tushar,fikes,gruber}@google.com
Google, Inc.
Abstract:
Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. In this paper we describe the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable.
Over the last two and a half years we have designed, implemented, and deployed a distributed storage system for managing structured data at Google called Bigtable. Bigtable is designed to reliably scale to petabytes of data and thousands of machines. Bigtable has achieved several goals: wide applicability, scalability, high performance, and high availability. Bigtable is used by more than sixty Google products and projects, including Google Analytics, Google Finance, Orkut, Personalized Search, Writely, and Google Earth. These products use Bigtable for a variety of demanding workloads, which range from throughput-oriented batch-processing jobs to latency-sensitive serving of data to end users. The Bigtable clusters used by these products span a wide range of configurations, from a handful to thousands of servers, and store up to several hundred terabytes of data.
In many ways, Bigtable resembles a database: it shares many implementation strategies with databases. Parallel databases [14] and main-memory databases [13] have achieved scalability and high performance, but Bigtable provides a different interface than such systems. Bigtable does not support a full relational data model; instead, it provides clients with a simple data model that supports dynamic control over data layout and format, and allows clients to reason about the locality properties of the data represented in the underlying storage. Data is indexed using row and column names that can be arbitrary strings. Bigtable also treats data as uninterpreted strings, although clients often serialize various forms of structured and semi-structured data into these strings. Clients can control the locality of their data through careful choices in their schemas. Finally, Bigtable schema parameters let clients dynamically control whether to serve data out of memory or from disk.
Section 2 describes the data model in more detail, and Section 3 provides an overview of the client API. Section 4 briefly describes the underlying Google infrastructure on which Bigtable depends. Section 5 describes the fundamentals of the Bigtable implementation, and Section 6 describes some of the refinements that we made to improve Bigtable's performance. Section 7 provides measurements of Bigtable's performance. We describe several examples of how Bigtable is used at Google in Section 8, and discuss some lessons we learned in designing and supporting Bigtable in Section 9. Finally, Section 10 describes related work, and Section 11 presents our conclusions.
Source Url : http://static.usenix.org/event/osdi06/tech/chang/chang_html/
分享到:
相关推荐
Bigtable:A Distributed Storage System for Structured Data,谷歌的一个大型的分布式数据库,这个数据库不是关系式的数据库。像它的名字一样,就是一个巨大的表格,用来存储结构化的数据。
Bigtable: A Distributed Storage System for Structured Data中文翻译
BigTable A Distributed Storage System for Structured Data 原文+译文
Paper for Big Table developed by Google
开源如此繁荣,需要感谢Google的三篇论文:《The Google File System》、《MapReduce: Simplified Data Processing on Large Clusters》和《Bigtable: A Distributed Storage System for Structured Data》,Google...
Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google ...
store modeled after Google' Bigtable: A Distributed Storage System for Structured Data by Chang et al.[2] Just as Bigtable leverages the distributed data storage provided by the Google File System, ...
[1]Bigtable: A Distributed Storage System for Structured Data [2]MapReduce: Simplified Data Processing on Large Clusters [3]The Google File System [4]Large-scale Incremental Processing Using ...
《The Google File System》 《MapReduce: Simplified Data Processing on Large Clusters》 《Bigtable: A Distributed Storage System for Structured Data》
key thesis for distributed system good for both new and experienced
Hadoop基础,google三大论文,hadoop学习必备 《The Google File System 》 2003年 《MapReduce: Simplified Data Processing on Large ...《Bigtable: A Distributed Storage System for Structured Data》 2006年
《Bigtable: A Distributed Storage System for Structured Data》 《Ranking and Semi-supervised Classification on Large Scale Graphs Using Map-Reduce》 《Mochi:Visual Log-Analysis Based Tools for ...
于2004年在OSDI上发表了《MapReduce: Simplified Data Processing on Large Clusters》,于2006年在OSDI上发表了《Bigtable: A Distributed Storage System for Structured Data》。这三篇论文后来成为云计算发展的...
谷歌3驾马车论文英文版,国外科学上网比较麻烦,为了方便大家下载,我意思性的收点分 [1] The Google File System; [2] MapReduce: Simplifed ...[3] Bigtable: A Distributed Storage System for Structured