what is
every db holds its storage level,either memory or fs.this is similar to,hbase,it has HFile as underlying data structure which will be stored in fs or dfs.
exquisite data structure also must be matched appropriate algoriths.of course ,different requirements will lead to heavily varied db design styles.
like some other index tools,HFile is a indexed storage structure mainly for fast access.the idea of HFle is from TFile ,sure,the former has a bit differences improved, and its internal structures will like below figures:
*note:this file format(not Data block) is for HFile v1,the part File Info is placed after Meta Index for v2.
whye
as described above,hbase uses special file instead of existances in hadoop (eg. SequenceFile) is to speedup the readings by rowkeys,as you can see some index blocks in previous figures:
a.construct a reversed index query,speed up for reading(mainly)
b.supplys a hadoop-independent style to read/write file format without considering dfs's compability
file | structure | compression | embed in |
SequenceFile |
3 file types: 1)uncompressed 2)record-comprssed(only values be compressed) 3)block-compressed(both keys and vlaues are compressed,similar to HFile v2) |
yes | hadoop |
HFile v1 | similar to v2,but some block indexes are different wth that. | yes | hbase |
HFile v2 |
1)indexed reversed style 2)most data(except trailer) in file are compressed if configure compressor |
yes |
hbase new file format for (94+) |
how to
there is a process model in the flush:
1.iterate all keyvaues in snapshot to write to memory buffer 2.generate a new data block if over a 'block-size' which in set when creating table 3.repeat 1 & 2 until no more data from snapshot 4.flush to memory compressed stream 5.flush to stream to outputStream (hfile stream) and clear the tmp buffer to avoid huage memory usage 6.similar to data block flush,flush meta block,data index,meta index,file-info and last trailer
why places extra parts of index and trailer to the last of a hfile?
i think some points are abvious:
a. as the index or trailer have some stats about the block and index,flushing memory buffer to hfile per 'block'(part of hfile) will min the memory overhead by region.
b. this can avoid going far away from the top of hfile offset to locate a special data block offset
others
using the or.apache.hadoop.hbase.io.hfile.HFile tool ,u wil look at the the details of it like below:
hbase hfile -f path-to-file
for a compressed file occupied size '18644309' will results like this:
Stats: Key length: count: 496573 min: 48 max: 53 mean: 49.44025551127427 Val length: count: 496573 min: 1 max: 915 mean: 34.053575204451306 Row size (bytes): count: 28930 min: 814 max: 3190 mean: 1570.45855513308 Row size (columns): count: 28930 min: 12 max: 21 mean: 17.16463878326996 Key of biggest row: 94678d0589778ade561378ac26dfd791Key length count--number of keys(composite keys,ie. rowkey+fml+col+ts+type)
相关推荐
HBase 元数据修复工具包。 ①修改 jar 包中的application.properties,重点是 zookeeper.address、zookeeper.nodeParent、hdfs....③开始修复 `java -jar -Drepair.tableName=表名 hbase-meta-repair-hbase-2.0.2.jar`
hbase-sdk是基于hbase-client和hbase-thrift的原生API封装的一款轻量级的HBase ORM框架。 针对HBase各版本API(1.x~2.x)间的差异,在其上剥离出了一层统一的抽象。并提供了以类SQL的方式来读写HBase表中的数据。对...
赠送jar包:phoenix-core-4.7.0-HBase-1.1.jar; 赠送原API文档:phoenix-core-4.7.0-HBase-1.1-javadoc.jar; 赠送源代码:phoenix-core-4.7.0-HBase-1.1-sources.jar; 赠送Maven依赖信息文件:phoenix-core-4.7.0...
hbase-client-2.1.0-cdh6.3.0.jar
HBase(hbase-2.4.9-bin.tar.gz)是一个分布式的、面向列的开源数据库,该技术来源于 Fay Chang 所撰写的Google论文“Bigtable:一个结构化数据的分布式存储系统”。就像Bigtable利用了Google文件系统(File System...
hbase-1.2.6.1-bin.tar.gz,hbase-1.2.6.1-bin.tar.gz,hbase-1.2.6.1-bin.tar.gz,hbase-1.2.6.1-bin.tar.gz,hbase-1.2.6.1-bin.tar.gz,hbase-1.2.6.1-bin.tar.gz,hbase-1.2.6.1-bin.tar.gz,hbase-1.2.6.1-bin.tar.gz
标题“hbase-1.2.1-bin.tar.gz.zip”表明这是HBase 1.2.1版本的二进制发行版,以tar.gz格式压缩,并且进一步用zip压缩。这种双重压缩方式可能用于减小文件大小,方便在网络上传输。用户需要先对zip文件进行解压,...
`hbase-2.4.11`源码包中包含了多个模块,如`hbase-client`、`hbase-server`、`hbase-common`等。`hbase-client`包含了与HBase交互的API,`hbase-server`则包含了服务器端组件,如RegionServer和Master,而`hbase-...
《Phoenix与HBase的深度解析:基于phoenix-hbase-2.4-5.1.2版本》 在大数据处理领域,Apache HBase和Phoenix是两个至关重要的组件。HBase作为一个分布式、列式存储的NoSQL数据库,为海量数据提供了高效、实时的访问...
phoenix-client-hbase-2.2-5.1.2.jar
"phoenix-5.0.0-HBase-2.0-client" 是一个针对Apache HBase数据库的Phoenix客户端库,主要用于通过SQL查询语句与HBase进行交互。这个版本的Phoenix客户端是为HBase 2.0版本设计和优化的,确保了与该版本HBase的兼容...
本文将深入探讨这两个技术及其结合体`phoenix-hbase-2.2-5.1.2-bin.tar.gz`的详细内容。 首先,HBase(Hadoop Database)是Apache软件基金会的一个开源项目,它构建于Hadoop之上,是一款面向列的分布式数据库。...
`hbase-1.2.0-cdh5.14.2.tar.gz` 是针对Cloudera Distribution Including Apache Hadoop (CDH) 5.14.2的一个特定版本的HBase打包文件。CDH是一个流行的Hadoop发行版,包含了多个大数据组件,如HDFS、MapReduce、YARN...
这个“hbase-2.4.17-bin”安装包提供了HBase的最新稳定版本2.4.17,适用于大数据处理和分析场景。下面将详细介绍HBase的核心概念、安装步骤以及配置和管理。 一、HBase核心概念 1. 表(Table):HBase中的表是由行...
进入 `conf` 目录,复制 `hbase-site.xml.example` 文件为 `hbase-site.xml`,并编辑该文件,添加如下配置: ```xml <name>hbase.rootdir <value>hdfs://namenode_host:port/hbase <name>hbase.cluster....
标题中的“hbase-1.1.2-bin.tar.gz”指的是HBase 1.1.2版本的二进制发行包,通常以压缩格式提供,方便用户下载并在Linux或Unix环境中安装使用。 HBase的设计灵感来源于Google的Bigtable论文,它在Hadoop之上构建,...
被编译的hive-hbase-handler-1.2.1.jar,用于在Hive中创建关联HBase表的jar,解决创建Hive关联HBase时报FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop....
《Phoenix与HBase的深度解析:基于phoenix-hbase-1.4-4.16.1-bin的探讨》 Phoenix是一种开源的SQL层,它为Apache HBase提供了高性能的关系型数据库查询能力。在大数据领域,HBase因其分布式、列式存储的特性,常被...
标题中的“hbase-2.4.11-bin.tar.gz”是指HBase的2.4.11稳定版本的二进制压缩包,用户可以通过下载这个文件来进行安装和部署。 HBase的核心设计理念是将数据按照行和列进行组织,这种模式使得数据查询和操作更加...
phoenix-4.14.1-HBase-1.2-client.jar