Hbase的hbase.hregion.max.filesize属性值用来指定region分割的阀值, 该值默认为268435456(256MB), 当一个列族文件大小超过该值时,将会分裂成两个region。
hbase的列可以有很多,设计时有两种方式可选择, 宽表(一行有很多列)和窄表
如有一个存储用户邮件的表
按宽表设计时,可以表示成(一个用户的所有邮件存成一行)
userid1 email1 emali2 email3 ... ... ... ... ... emailn
userid2 email1 emali2 email3 ... ... ... ... ... emailn
useridn
按窄表设计时,可以表示成(rowkey由用ID和emailID组成)
userid1_emialid1 email1
userid1_emialid2 email2
userid1_emialid3 email2
userid1_emialidn emailn
userid2_emialid1 email1
userid2_emialid2 email2
userid2_emialid3 email3
userid2_emialidn emailn
这两种设计方法会对region的分割造成影响, 今天在看HFileOutputFormat代码时发现它new出的RecordWriter对 region分割有一定的限制,
只有当rowkey不同是才会做分割, 而rowkey相同时即使region大小已经超过hbase.hregion.max.filesize值, 也不会分割
RecordWriter代码:
- public void write(ImmutableBytesWritable row, KeyValue kv)
- throws IOException {
- long length = kv.getLength();
- byte [] family = kv.getFamily();
- WriterLength wl = this.writers.get(family);
- if (wl == null || ((length + wl.written) >= maxsize) &&
- Bytes.compareTo(this.previousRow, 0, this.previousRow.length,
- kv.getBuffer(), kv.getRowOffset(), kv.getRowLength()) != 0) {
- // Get a new writer.
- Path basedir = new Path(outputdir, Bytes.toString(family));
- if (wl == null) {
- wl = new WriterLength();
- this.writers.put(family, wl);
- if (this.writers.size() > 1) throw new IOException("One family only");
- // If wl == null, first file in family. Ensure family dir exits.
- if (!fs.exists(basedir)) fs.mkdirs(basedir);
- }
- wl.writer = getNewWriter(wl.writer, basedir);
- LOG.info("Writer=" + wl.writer.getPath() +
- ((wl.written == 0)? "": ", wrote=" + wl.written));
- wl.written = 0;
- }
- kv.updateLatestStamp(this.now);
- wl.writer.append(kv);
- wl.written += length;
- // Copy the row so we know when a row transition.
- this.previousRow = kv.getRow();
- }
标红加粗部分说明当块大小大于hbase.hregion.max.filesize值, 并却当前行与上一次插入的行不同时才会分割region.
1. 宽表情况下, 单独一行大小超过hbase.hregion.max.filesize值, 不会做分割
2. 相同rowkey下插入很多不同版本的记录,即使大小超过hbase.hregion.max.filesize值, 也不会做分割
下面就来验证下:
为了尽早看到效果, 需要在hbase-site.xml中修改两个配置参数
- <property>
- <name>hbase.hregion.memstore.flush.size</name>
- <value>5</value>
- <description>
- Memstore will be flushed to disk if size of the memstore
- exceeds this number of bytes. Value is checked by a thread that runs
- every hbase.server.thread.wakefrequency.
- </description>
- </property>
- <property>
- <name>hbase.hregion.max.filesize</name>
- <value>10</value>
- <description>
- Maximum HStoreFile size. If any one of a column families' HStoreFiles has
- grown to exceed this value, the hosting HRegion is split in two.
- Default: 256M.
- </description>
- </property>
建测试表t1和t2
- hbase(main):076:0* create 't1','f1'
- 0 row(s) in 1.6460 seconds
- hbase(main):077:0> create 't2','f1'
- 0 row(s) in 1.1790 seconds
查看系统表 .META.
- hbase(main):081:0* scan '.META.'
- ROW COLUMN+CELL
- t1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad column=info:regioninfo,timestamp=1314720667384, value=REGION => {NAME =>'t1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad.', STARTKEY => '', ENDK
- . EY => '', ENCODED =>d8acd6bc659ac8326b88850d645a90ad, TABLE => {{NAME => 't1', FAMILIES => [{NAME =>'f1', BLOOMFILTER => 'NONE', REPLICATION_SCOPE
- => '0', COMPRESSION => 'NONE',VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',BLOCKCACHE => 'true'}]}}
- t1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad column=info:server,timestamp=1314720667941,value=yinjie:60020
- .
- t1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad column=info:serverstartcode,timestamp=1314720667941,value=1314716290123
- .
- t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71 column=info:regioninfo,timestamp=1314720672241, value=REGION => {NAME =>'t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71.', STARTKEY => '', ENDK
- . EY => '', ENCODED =>16bb3d2563eab3b4e25477c64e007e71, TABLE => {{NAME => 't2', FAMILIES => [{NAME =>'f1', BLOOMFILTER => 'NONE', REPLICATION_SCOPE
- => '0', COMPRESSION => 'NONE',VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',BLOCKCACHE => 'true'}]}}
- t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71 column=info:server,timestamp=1314720672346,value=yinjie:60020
- .
- t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71 column=info:serverstartcode,timestamp=1314720672346,value=1314716290123
- .
- 2 row(s) in 0.0230 seconds
可以看到此时,t1,t2都已有一个region
先往t1表插入10条记录,rowkwy相同
- hbase(main):086:0* for i in 0..9 do\
- hbase(main):087:1* put 't1','row1',"f1:c#{i}","swallow#{i}"\
- hbase(main):088:1* end
- 0 row(s) in 0.0180 seconds
- 0 row(s) in 0.0070 seconds
- 0 row(s) in 0.0420 seconds
- 0 row(s) in 0.0620 seconds
- 0 row(s) in 0.0120 seconds
- 0 row(s) in 0.0770 seconds
- 0 row(s) in 0.0150 seconds
- 0 row(s) in 0.1290 seconds
- 0 row(s) in 10.0740 seconds
- 0 row(s) in 0.1230 seconds
- => 0..9
- hbase(main):089:0>
查看t1记录
- hbase(main):089:0> scan 't1'
- ROW COLUMN+CELL
- row1 column=f1:c0,timestamp=1314720946495,value=swallow0
- row1 column=f1:c1,timestamp=1314720946507,value=swallow1
- row1 column=f1:c2,timestamp=1314720946903,value=swallow2
- row1 column=f1:c3,timestamp=1314720946939,value=swallow3
- row1 column=f1:c4,timestamp=1314720946976,value=swallow4
- row1 column=f1:c5,timestamp=1314720947055,value=swallow5
- row1 column=f1:c6,timestamp=1314720947070,value=swallow6
- row1 column=f1:c7,timestamp=1314720947198,value=swallow7
- row1 column=f1:c8,timestamp=1314720957272,value=swallow8
- row1 column=f1:c9,timestamp=1314720957392,value=swallow9
- 1 row(s) in 0.0300 seconds
查看 .META.
- hbase(main):090:0> scan '.META.'
- ROW COLUMN+CELL
- t1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad column=info:regioninfo,timestamp=1314720667384, value=REGION => {NAME =>'t1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad.', STARTKEY => '', ENDK
- . EY => '', ENCODED =>d8acd6bc659ac8326b88850d645a90ad, TABLE => {{NAME => 't1', FAMILIES => [{NAME =>'f1', BLOOMFILTER => 'NONE', REPLICATION_SCOPE
- => '0', COMPRESSION => 'NONE',VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',BLOCKCACHE => 'true'}]}}
- t1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad column=info:server,timestamp=1314720667941,value=yinjie:60020
- .
- t1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad column=info:serverstartcode,timestamp=1314720667941,value=1314716290123
- .
- t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71 column=info:regioninfo,timestamp=1314720672241, value=REGION => {NAME =>'t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71.', STARTKEY => '', ENDK
- . EY => '', ENCODED =>16bb3d2563eab3b4e25477c64e007e71, TABLE => {{NAME => 't2', FAMILIES => [{NAME =>'f1', BLOOMFILTER => 'NONE', REPLICATION_SCOPE
- => '0', COMPRESSION => 'NONE',VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',BLOCKCACHE => 'true'}]}}
- t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71 column=info:server,timestamp=1314720672346,value=yinjie:60020
- .
- t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71 column=info:serverstartcode,timestamp=1314720672346,value=1314716290123
- .
- 2 row(s) in 0.0210 seconds
可以看到t1仍旧只有一个region
接下去往往t2表插入10条相同记录,但rowkwy不同
- hbase(main):091:0> for i in 0..9 do\
- hbase(main):092:1* put 't2',"row#{i}","f1:c#{i}","swallow#{i}"\
- hbase(main):093:1* end
- 0 row(s) in 0.1140 seconds
- 0 row(s) in 0.0080 seconds
- 0 row(s) in 0.0410 seconds
- 0 row(s) in 0.0820 seconds
- 0 row(s) in 0.0210 seconds
- 0 row(s) in 0.0410 seconds
- 0 row(s) in 0.0200 seconds
- 0 row(s) in 0.1210 seconds
- 0 row(s) in 0.0140 seconds
- 0 row(s) in 0.0360 seconds
- => 0..9
查看t2记录
- hbase(main):097:0* scan 't2'
- ROW COLUMN+CELL
- row0 column=f1:c0,timestamp=1314721110769,value=swallow0
- row1 column=f1:c1,timestamp=1314721110787,value=swallow1
- row2 column=f1:c2,timestamp=1314721110830,value=swallow2
- row3 column=f1:c3,timestamp=1314721110916,value=swallow3
- row4 column=f1:c4,timestamp=1314721110932,value=swallow4
- row5 column=f1:c5,timestamp=1314721110971,value=swallow5
- row6 column=f1:c6,timestamp=1314721110989,value=swallow6
- row7 column=f1:c7,timestamp=1314721111121,value=swallow7
- row8 column=f1:c8,timestamp=1314721111130,value=swallow8
- row9 column=f1:c9,timestamp=1314721111172,value=swallow9
- 10 row(s) in 1.0450 seconds
查看 .META.
- hbase(main):102:0> scan '.META.'
- ROW COLUMN+CELL
- t1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad column=info:regioninfo,timestamp=1314720667384, value=REGION => {NAME =>'t1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad.', STARTKEY => '', ENDK
- . EY => '', ENCODED =>d8acd6bc659ac8326b88850d645a90ad, TABLE => {{NAME => 't1', FAMILIES => [{NAME =>'f1', BLOOMFILTER => 'NONE', REPLICATION_SCOPE
- => '0', COMPRESSION => 'NONE',VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',BLOCKCACHE => 'true'}]}}
- t1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad column=info:server,timestamp=1314720667941,value=yinjie:60020
- .
- t1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad column=info:serverstartcode,timestamp=1314720667941,value=1314716290123
- .
- t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71 column=info:regioninfo,timestamp=1314721112130, value=REGION => {NAME =>'t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71.', STARTKEY => '', ENDK
- . EY => '', ENCODED =>16bb3d2563eab3b4e25477c64e007e71, OFFLINE => true, SPLIT => true, TABLE => {{NAME=> 't2', FAMILIES => [{NAME => 'f1', BLOOMFILT
- ER => 'NONE', REPLICATION_SCOPE=> '0', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE =>'65536', IN_MEMORY => 'false', BLOC
- KCACHE =>'true'}]}}
- t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71 column=info:server,timestamp=1314720672346,value=yinjie:60020
- .
- t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71 column=info:serverstartcode,timestamp=1314720672346,value=1314716290123
- .
- t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71 column=info:splitA,timestamp=1314721112130, value=REGION => {NAME =>'t2,,1314721111490.71df02214242923574b71fe5e2a19360.', STARTKEY => '', ENDKEY =
- . > 'row0', ENCODED =>71df02214242923574b71fe5e2a19360, TABLE => {{NAME => 't2', FAMILIES => [{NAME =>'f1', BLOOMFILTER => 'NONE', REPLICATION_SCOPE
- => '0', VERSIONS => '3',COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY =>'false', BLOCKCACHE => 'true'}]}}
- t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71 column=info:splitB,timestamp=1314721112130, value=REGION => {NAME =>'t2,row0,1314721111490.915ee8d4a32c59a4ec3960e335b061ca.', STARTKEY => 'row0',
- . ENDKEY => '', ENCODED =>915ee8d4a32c59a4ec3960e335b061ca, TABLE => {{NAME => 't2', FAMILIES => [{NAME =>'f1', BLOOMFILTER => 'NONE', REPLICATION_SC
- OPE => '0', VERSIONS => '3',COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY =>'false', BLOCKCACHE => 'true'}]}}
- t2,,1314721111490.71df02214242923574b71fe5e2a19360 column=info:regioninfo,timestamp=1314721112267, value=REGION => {NAME =>'t2,,1314721111490.71df02214242923574b71fe5e2a19360.', STARTKEY => '', ENDK
- . EY => 'row0', ENCODED =>71df02214242923574b71fe5e2a19360, TABLE => {{NAME => 't2', FAMILIES => [{NAME =>'f1', BLOOMFILTER => 'NONE', REPLICATION_SC
- OPE => '0', VERSIONS => '3',COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY =>'false', BLOCKCACHE => 'true'}]}}
- t2,,1314721111490.71df02214242923574b71fe5e2a19360 column=info:server,timestamp=1314721112267,value=yinjie:60020
- .
- t2,,1314721111490.71df02214242923574b71fe5e2a19360 column=info:serverstartcode,timestamp=1314721112267,value=1314716290123
- .
- t2,row0,1314721111490.915ee8d4a32c59a4ec3960e335b0 column=info:regioninfo,timestamp=1314721112627, value=REGION => {NAME =>'t2,row0,1314721111490.915ee8d4a32c59a4ec3960e335b061ca.', STARTKEY => 'row
- 61ca. 0', ENDKEY => '', ENCODED =>915ee8d4a32c59a4ec3960e335b061ca, TABLE => {{NAME => 't2', FAMILIES => [{NAME =>'f1', BLOOMFILTER => 'NONE', REPLICATIO
- N_SCOPE => '0', VERSIONS =>'3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY=> 'false', BLOCKCACHE => 'true'}]}}
- t2,row0,1314721111490.915ee8d4a32c59a4ec3960e335b0 column=info:server,timestamp=1314721112627,value=yinjie:60020
- 61ca.
- t2,row0,1314721111490.915ee8d4a32c59a4ec3960e335b0 column=info:serverstartcode,timestamp=1314721112627,value=1314716290123
- 61ca.
- 4 row(s) in 0.0380 seconds
可以看到t2的region已经分裂.
相关推荐
HBase是一个分布式的、基于列族的NoSQL数据库,它构建在Hadoop之上,提供了高性能、低延迟的数据存储和访问能力。本教程将详细介绍如何使用Java API来创建和删除HBase表,并针对不使用Maven的初学者提供必要的jar包...
(1) 列出HBase所有的表的相关信息,例如表名、创建时间等;(2) 在终端打印出指定的表的所有记录数据;(3) 向已经创建好的表添加和删除指定的列族或列;(4) 清空指定的表的所有记录数据(5) 统计表的行数。2...
HBase 表的属性设置对系统的性能和存储空间有着重要的影响。常见的表属性包括最大版本数、压缩算法、inmemory、bloomfilter 等。 * 最大版本数:通常设置为 3,但对于频繁更新的应用,可以设置为 1,以快速淘汰无用...
HBase是一个分布式、面向列的NoSQL数据库,它构建于Hadoop之上,提供实时访问大量数据的能力。Scala是一种强大的函数式编程语言,与Java虚拟机(JVM)兼容,因此非常适合编写HBase的客户端程序。 首先,确保你的...
Java 操作 Hbase 进行建表、删表以及对数据进行增删改查 一、Hbase 简介 Hbase 是一个开源的、分布式的、基于 column-family 的 NoSQL 数据库。它是基于 Hadoop 的,使用 HDFS 作为其存储层。Hbase 提供了高性能、...
HBase是Google Bigtable的开源实现,它在Hadoop文件系统(HDFS)之上构建,提供了高度可扩展性和实时读写能力。HBase的数据模型是非关系型的,数据被组织成表,每行都有一个唯一的行键,列由列族和时间戳定义。这种...
**HBase实验报告** 在本实验中,我们主要聚焦于HBase,这是一个基于谷歌Bigtable设计的开源...在后续的学习和实践中,应深入研究HBase的其他高级特性,如Region Split、Compaction等,以便更好地应用到实际项目中。
hbase表结构设计,新建表,查询表语句,删除表数据,删除表的例子。
而HBase是建立在Hadoop之上的非关系型数据库,它是列族模型,适用于大规模、高并发的数据存储和检索,尤其适合实时分析。 在“基于Mysql的表转HBase小Demo”中,重点在于Dao(Data Access Object)层的转换。Dao层...
假设我们创建一个与`users`表对应的HBase表,名为`user_data`,包含两个列族:`info`和`meta`,分别对应MySQL中的`name`和`email`字段。HBase创建表的Java代码可能如下: ```java import org.apache.hadoop.hbase....
而Spring Data Hadoop是Spring框架的一部分,它提供了与Hadoop生态系统集成的工具,包括对HBase的操作支持。本篇文章将详细讲解如何利用Spring Data Hadoop中的HbaseTemplate来操作HBase。 首先,我们需要理解...
HBase – Hadoop Database,是一个高可靠性、高性能、面向列、可伸缩的分布式存储系统,利用HBase技术可在廉价PC Server上搭建起大规模结构化存储集群。 HBase是Google Bigtable的开源实现,类似Google Bigtable...
列族是表中列的集合,而列则是在列族之下的,对于每一个列族,HBase都会为每个列族中的列数据存储一份数据。列族的概念对于HBase来说至关重要,因为一个列族下的所有列都会被物理存储在一起。当一个行键被访问时,...
创建表是HBase中最基础的操作之一。可以通过`create`命令来实现。例如,要创建一个名为`scores`的新表,其中包含两个列族`grade`和`course`,可以使用如下命令: ```bash hbase(main):001:0> create 'scores', '...
Hive与Hbase的整合,集中两者的优势,使用HiveQL语言,同时具备了实时性
一款强大的HBase表管理系统,目前系统集成的功能有,命名空间管理,表管理,列簇管理,标签机制,快照管理,以及一些常见的统计指标展示等,另外,系统还内置了HBaseSQL的功能,欢迎大家下载。 一款强大的HBase表...
Java SpringBoot 连接 Hbase Demo 创建表 插入数据 列族 列 查询:全表、数据过滤 删除数据 删除表 Hbase 集群搭建:https://blog.csdn.net/weixin_42176639/article/details/131796472
│ Day1503_Hbase与MYSQL的存储比较.mp4 │ Day1504_Hbase部署环境准备.mp4 │ Day1505_Hbase伪分布式配置文件的修改.mp4 │ Day1506_Hbase伪分布式的启动及hbase命令的使用.mp4 │ Day1507_Hbase shell中namespace...
在大数据领域,分布式数据库HBase是处理海量结构化半结构化数据的重要工具,尤其是在与Hadoop结合使用时,能够提供高效、可扩展的数据存储和查询能力。本文将详细讲解如何在Hadoop环境中安装HBase以及如何使用Java ...