`

HBase Versus Bigtable(comparison)

 
阅读更多

below views are all from hbase's guide(maybe with my some comments fonted by deferent size)

 

Overall, HBase implements close to all of the features described in Chapter 1. Where it differs, it may have to because either the Bigtable paper was not very clear to begin with, or it relies on other open source projects to provide various services and those simply work differently.

 

 

HBase stores timestamps in milliseconds—as opposed to Bigtable, which uses micro- seconds. This is not much of an issue and can possibly be attributed to C and Java having different preferred timer resolutions.

( i have saw that someone were talking about using 'us' instead of ms a few weeks ago,maybe this feature will be refenenced)

 

While we have not yet addressed the specific details, it should be pointed out that both also use different compression algorithms. HBase uses those supplied in Java, but can also use LZO (with a bit of work; we will look into this later).* Bigtable has a two-phase compression using BMDiff and Zippy(the father of snappy).

 

HBase has coprocessors that are different from what Sawzall, the scripting language used in Bigtable to filter or aggregate data, or the Bigtable Coprocessor framework,provides. The details on Google’s coprocessor implementation are rather sketchy, so if there are more differences, they are unknown. On the other hand, HBase has support for server-side filters that help reduce the amount of data being moved from the server to the client.

 

HBase does primarily work with the Hadoop Distributed File System (HDFS), while Bigtable uses GFS. But HBase can also work on other filesystems thanks to the pluggable FileSystem class provided by Hadoop. There are implementations for Amazon S3 (raw or emulated HDFS), as well as EBS.

 

HBase cannot map storage files into memory, something that is available in Bigtable. There is ongoing work in HBase to optimize I/O performance, and with the addition

 

 

* While writing this book, Google made Zippy available under the Apache license and the name Snappy. The work to integrate it with HBase is still in progress. See the project’s online repository for details.

† Jeff Dean gave a talk at LADIS ’09 (pages 66-67) mentioning coprocessors.

of more widespread use of Java’s New I/O (NIO), it may be something that could be enhanced.

 

Bigtable has a concept called locality groups, which allow the client to group specific column families together and apply shared features, such as compression. This is also useful when the contained columns are accessed together, as all the data is stored in the same storage files. Column families in Bigtable are used for accounting and access control. In HBase, on the other hand, there is only the concept of column families, combining the features that Bigtable has in two distinct concepts.

 

Apart from the block cache that both systems have, Bigtable also implements a key/ value cache, probably for cells that are accessed a lot.

 

The handling and implementation of the commit log also differs slightly. Bigtable has two commit logs to handle slow writes and is able to switch between them to com- pensate for that. This could be implemented in HBase, but it does not seem to be a topic for discussion, and therefore is omitted for the time being.

In contrast, HBase has an option to skip the commit log completely on writes for per- formance reasons and when the possibility of not being able to replay those logs after a server crash is acceptable.

 

 

The METADATA table in Bigtable is also used to store secondary information such as log events related to each tablet. This historical data can be used to analyze tablet transi- tions, splits, and/or merges. HBase had the notion of a historian in earlier versions that implemented the same concept, but its performance was not good enough and it has been removed.

 

While splitting regions/tablets is the same for both, merging is handled differently. HBase has a tool that helps you to merge regions manually, while in Bigtable this is handled automatically by the master. Merging in HBase is a delicate operation and currently is left to the operator to decide what is best.

 

Another very minor difference is that the master in Bigtable is doing the garbage collection of obsolete storage files. One reason for this could be the fact that, in Bigtable, the storage files are tracked in the METADATA table. For HBase, the cleanup is done by the region server that has done the split and no file location is recorded explicitly.

 

Bigtable can memory-map entire storage files and use them to perform lookups without a single disk seek. HBase has an in-memory option per column family and uses its LRU cacheto retain blocks for subsequent use.

 

There are also some differences in the compaction algorithms. For example, a merging compaction also includes a memtable flush. Mostly, though, they are the same and simply use different names.

‡ See Cache algorithms on Wikipedia. 498 | Appendix F:

 

 

Region names, as stored in the meta table in HBase, are a combination of the table name, the start row key, and an ID. In Bigtable, the corresponding tablet names consist of the table identifier and the end row. This has a few implications when it comes to locating data in the storage files (see “Read Path” on page 342).

 

 

Finally, it can be noted that HBase has two separate catalog tables, -ROOT- and .META., while in Bigtable the root table, since in both systems it only ever consists of one single region/tablet, is stored as part of the meta table. The first tablet in the METADATA table is the root tablet, and all subsequent ones are the meta tablets. This is just an implementation detail. 

分享到:
评论

相关推荐

    understanding hbase and bigtable.pdf

    HBase是Google BigTable的开源实现,了解它是一项挑战,因为它的概念并不直观,尤其对于那些习惯了关系数据库管理系统(RDBMS)的人来说。为了更好地理解HBase,我们需要从概念上分析它。本文的目的是从一个理论的...

    Bigtable-参考-understanding-hbase and bigtable1

    【标题】:理解HBase与BigTable 【描述】:本文旨在从概念层面解析HBase(Google BigTable的开源实现)及其数据存储系统的本质,帮助读者理解何时选择HBase,何时选择传统数据库。 【标签】:HBase 【正文】: ...

    hadoop面试题:HBase与BigTable的比较.pdf

    HBase与Google的BigTable都是分布式列式存储系统,常用于处理大规模数据。HBase是BigTable的开源实现,尽管两者在架构上有许多相似之处,但它们在具体实现和特性上仍存在一些差异。 首先,HBase和BigTable都支持行...

    毕业设计 基于Hbase的Bigtable系统的研究与实践

    主要是自己大学时候的毕业设计,关于Hbase下用聚类算法写的一个搜索工具,实现了将文本存入数据库,然后进行搜索的算法。其中包括了word毕业设计文档,还有答辩的ppt,还有在linux平台下的java源码,希望对这方面有...

    Hbase学习总结 bigtable建立与操作

    hbase是bigtable的开源山寨版本。是建立的hdfs之上,提供高可靠性、高性能、列存储、可伸缩、实时读写的数据库系统。 它介于nosql和RDBMS之间,仅能通过主键(row key)和主键的range来检索数据,仅支持单行事务(可...

    orcus:适用于Scala的HBase Bigtable客户端

    orcus是一个与HBase / Bigtable交互的库,该库在 / 之上构建,用于连接到HBase / Bigtable实例。 另外,它具有自动将结果对象派生到任意类型对象的功能。 如何使用它 HBase的 libraryDependencies + = Seq ( " ...

    Hbase.mmap

    HBASE是bigTable,(源代码是Java编写)的开源版本,是Apache Hadoop的数据库,是建立在hdfs之上,被设计用来提供高可靠性,高性能、列存储、可伸缩、多版本,的Nosql的分布式数据存储系统,实现对大型数据的实时,...

    java8看不到源码-geowave:GeoWave在Accumulo、HBase、BigTable、Cassandra、Kudu、Redis

    java8 看不到源码 关于 持续集成 执照 聊天 GeoWave 是一套开源软件,它可以: 能力 向键/值存储添加多维索引功能(目前, , , , , , , , and ...的命令行工具来帮助您入门这通常是在您自己的机器上

    Hbase 表设计与操作

    HBase是Google Bigtable的开源实现,类似Google Bigtable利用GFS作为其文件存储系统,HBase利用Hadoop HDFS作为其文件存储系统;Google运行MapReduce来处理Bigtable中的海量数据,HBase同样利用Hadoop MapReduce来...

    HBase(hbase-2.4.9-bin.tar.gz)

    HBase(hbase-2.4.9-bin.tar.gz)是一个分布式的、面向列的开源数据库,该技术来源于 Fay Chang 所撰写的Google论文“Bigtable:一个结构化数据的分布式存储系统”。就像Bigtable利用了Google文件系统(File System...

    HBase技术介绍

    HBase借鉴了Google Bigtable的设计理念,但它是开源的,因此可以免费使用并进行社区驱动的开发。 HBase的核心特性包括高可靠性、高性能和可伸缩性。它依赖于Hadoop的HDFS(Hadoop Distributed File System)作为...

    hbase介绍以及详细讲解

    HBase采用了类似BigTable的数据模型,提供了一个分布式、可扩展的存储解决方案。 ## 应用场景 HBase常用于需要实时查询大量数据的场景,如日志分析、物联网(IoT)设备数据存储、用户行为追踪、搜索引擎索引等。由于...

    hbase-0.98.9-src.tar

    data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Apache Hadoop [3]. To get started using HBase, the full documentation for this release can be ...

    HbaseTemplate 操作hbase

    在IT行业中,尤其是在大数据处理领域,HBase是一个广泛使用的分布式、高性能、列式存储的NoSQL数据库。HBase是建立在Hadoop文件系统(HDFS)之上,为处理大规模数据提供了一个高效的数据存储解决方案。而Spring Data...

    hbase-2.2.7-bin.tar.gz

    此外,HBase采用BigTable模型,数据以表的形式组织,每张表被分成多个Region,Region分布在集群的各个节点上,实现负载均衡。 在HBase 2.2.7中,你可以发现以下关键特性: 1. **分布式架构**:HBase通过Hadoop的...

    7-分布式数据库HBase.ppt

    HBase,全称为Hadoop Distributed File System Base,是一个基于Google BigTable设计理念的开源分布式列式数据库。它设计的目标是处理超大规模的数据,支持PB级别的数据量,并可在数千台服务器组成的集群上运行。...

    hbase-0.90.5.tar.gz

    HBase是Google Bigtable的开源实现,类似Google Bigtable利用GFS作为其文件存储系统,HBase利用Hadoop HDFS作为其文件存储系统;Google运行MapReduce来处理Bigtable中的海量数据,HBase同样利用Hadoop MapReduce来...

    14丨BigTable的开源实现:HBase.html

    14丨BigTable的开源实现:HBase.html

    HBase数据库性能调优

    HBase是一个分布式的、面向列的开源数据库,该技术来源于 Fay Chang 所撰写的Google论文“Bigtable:一个结构化数据的分布式存储系统”。就像Bigtable利用了Google文件系统(File System)所提供的分布式数据存储...

Global site tag (gtag.js) - Google Analytics