1. 首先看看HBase中存储的文件内容
执行如下命令添加测试数据:
create 'table3', 'colfam1', { SPLITS => ['row-300', 'row-500', 'row-700' , 'row-900'] }
for i in '0'..'9' do for j in '0'..'9' do for k in '0'..'9' do put 'table3', "row-#{i}#{j}#{k}", "colfam1:#{j}#{k}", "#{j}#{k}" end end end
将数据从MemStore刷到磁盘中
flush 'table3'
再次执行一次:
for i in '0'..'9' do for j in '0'..'9' do for k in '0'..'9' do put 'table3', "row-#{i}#{j}#{k}", "colfam1:#{j}#{k}", "#{j}#{k}" end end end
然后在hbase命令行中执行如下命令
[hadoop@hadoop bin]$ ./hbase org.apache.hadoop.hbase.io.hfile.HFile -f /hbase/data/default/table3/1fa2e49c7404d3cd39afc39a99cc1c26/colfam1/0f6fc234c3014b6e9d84d3cae065d1b4 -v -m -p
其中:
1fa2e49c7404d3cd39afc39a99cc1c26表示region名字,0f6fc234c3014b6e9d84d3cae065d1b4表示一个HFile的名字
打印结果:
Scanning -> /hbase/data/default/table3/1fa2e49c7404d3cd39afc39a99cc1c26/colfam1/0f6fc234c3014b6e9d84d3cae065d1b4 2015-04-09 22:53:01,918 INFO [main] hfile.CacheConfig: CacheConfig:disabled ///注释:K:和V:表示HFile中的KV数据对,从下面的输出中可以看到,每个K都占用比较多的字节数,它是由rowKey,column(family:columnName)...组成 ///The actual data stored as serialized KeyValue instances K: row-500/colfam1:00/1428632364152/Put/vlen=2/seqid=5 V: 00 K: row-501/colfam1:01/1428632364177/Put/vlen=2/seqid=7 V: 01 K: row-502/colfam1:02/1428632364204/Put/vlen=2/seqid=9 V: 02 K: row-503/colfam1:03/1428632364287/Put/vlen=2/seqid=11 V: 03 K: row-504/colfam1:04/1428632364309/Put/vlen=2/seqid=13 V: 04 K: row-505/colfam1:05/1428632364318/Put/vlen=2/seqid=15 V: 05 K: row-506/colfam1:06/1428632364330/Put/vlen=2/seqid=17 V: 06 K: row-507/colfam1:07/1428632364351/Put/vlen=2/seqid=19 V: 07 K: row-508/colfam1:08/1428632364361/Put/vlen=2/seqid=21 V: 08 K: row-509/colfam1:09/1428632364381/Put/vlen=2/seqid=23 V: 09 K: row-510/colfam1:10/1428632364400/Put/vlen=2/seqid=25 V: 10 K: row-511/colfam1:11/1428632364411/Put/vlen=2/seqid=27 V: 11 K: row-512/colfam1:12/1428632364426/Put/vlen=2/seqid=29 V: 12 K: row-513/colfam1:13/1428632364440/Put/vlen=2/seqid=31 V: 13 K: row-514/colfam1:14/1428632364474/Put/vlen=2/seqid=33 V: 14 K: row-515/colfam1:15/1428632364496/Put/vlen=2/seqid=35 V: 15 K: row-516/colfam1:16/1428632364521/Put/vlen=2/seqid=37 V: 16 K: row-517/colfam1:17/1428632364528/Put/vlen=2/seqid=39 V: 17 K: row-518/colfam1:18/1428632364539/Put/vlen=2/seqid=41 V: 18 K: row-519/colfam1:19/1428632364551/Put/vlen=2/seqid=43 V: 19 K: row-520/colfam1:20/1428632364561/Put/vlen=2/seqid=45 V: 20 K: row-521/colfam1:21/1428632364574/Put/vlen=2/seqid=47 V: 21 K: row-522/colfam1:22/1428632364589/Put/vlen=2/seqid=49 V: 22 K: row-523/colfam1:23/1428632364602/Put/vlen=2/seqid=51 V: 23 K: row-524/colfam1:24/1428632364617/Put/vlen=2/seqid=53 V: 24 K: row-525/colfam1:25/1428632364634/Put/vlen=2/seqid=55 V: 25 K: row-526/colfam1:26/1428632364647/Put/vlen=2/seqid=57 V: 26 K: row-527/colfam1:27/1428632364653/Put/vlen=2/seqid=59 V: 27 K: row-528/colfam1:28/1428632364665/Put/vlen=2/seqid=61 V: 28 K: row-529/colfam1:29/1428632364734/Put/vlen=2/seqid=63 V: 29 K: row-530/colfam1:30/1428632364746/Put/vlen=2/seqid=65 V: 30 K: row-531/colfam1:31/1428632364760/Put/vlen=2/seqid=67 V: 31 K: row-532/colfam1:32/1428632364777/Put/vlen=2/seqid=69 V: 32 K: row-533/colfam1:33/1428632364819/Put/vlen=2/seqid=71 V: 33 K: row-534/colfam1:34/1428632364831/Put/vlen=2/seqid=73 V: 34 K: row-535/colfam1:35/1428632364837/Put/vlen=2/seqid=75 V: 35 K: row-536/colfam1:36/1428632364846/Put/vlen=2/seqid=77 V: 36 K: row-537/colfam1:37/1428632364852/Put/vlen=2/seqid=79 V: 37 K: row-538/colfam1:38/1428632364861/Put/vlen=2/seqid=81 V: 38 K: row-539/colfam1:39/1428632364872/Put/vlen=2/seqid=83 V: 39 K: row-540/colfam1:40/1428632364880/Put/vlen=2/seqid=85 V: 40 K: row-541/colfam1:41/1428632364886/Put/vlen=2/seqid=87 V: 41 K: row-542/colfam1:42/1428632364897/Put/vlen=2/seqid=89 V: 42 K: row-543/colfam1:43/1428632364909/Put/vlen=2/seqid=91 V: 43 K: row-544/colfam1:44/1428632364924/Put/vlen=2/seqid=93 V: 44 K: row-545/colfam1:45/1428632364937/Put/vlen=2/seqid=95 V: 45 K: row-546/colfam1:46/1428632364946/Put/vlen=2/seqid=97 V: 46 K: row-547/colfam1:47/1428632364955/Put/vlen=2/seqid=99 V: 47 K: row-548/colfam1:48/1428632364964/Put/vlen=2/seqid=101 V: 48 K: row-549/colfam1:49/1428632364976/Put/vlen=2/seqid=103 V: 49 K: row-550/colfam1:50/1428632364982/Put/vlen=2/seqid=105 V: 50 K: row-551/colfam1:51/1428632364992/Put/vlen=2/seqid=107 V: 51 K: row-552/colfam1:52/1428632365001/Put/vlen=2/seqid=109 V: 52 K: row-553/colfam1:53/1428632365011/Put/vlen=2/seqid=111 V: 53 K: row-554/colfam1:54/1428632365020/Put/vlen=2/seqid=113 V: 54 K: row-555/colfam1:55/1428632365035/Put/vlen=2/seqid=115 V: 55 K: row-556/colfam1:56/1428632365048/Put/vlen=2/seqid=117 V: 56 K: row-557/colfam1:57/1428632365056/Put/vlen=2/seqid=119 V: 57 K: row-558/colfam1:58/1428632365064/Put/vlen=2/seqid=121 V: 58 K: row-559/colfam1:59/1428632365080/Put/vlen=2/seqid=123 V: 59 K: row-560/colfam1:60/1428632365095/Put/vlen=2/seqid=125 V: 60 K: row-561/colfam1:61/1428632365111/Put/vlen=2/seqid=127 V: 61 K: row-562/colfam1:62/1428632365123/Put/vlen=2/seqid=129 V: 62 K: row-563/colfam1:63/1428632365133/Put/vlen=2/seqid=131 V: 63 K: row-564/colfam1:64/1428632365142/Put/vlen=2/seqid=133 V: 64 K: row-565/colfam1:65/1428632365151/Put/vlen=2/seqid=135 V: 65 K: row-566/colfam1:66/1428632365159/Put/vlen=2/seqid=137 V: 66 K: row-567/colfam1:67/1428632365169/Put/vlen=2/seqid=139 V: 67 K: row-568/colfam1:68/1428632365179/Put/vlen=2/seqid=141 V: 68 K: row-569/colfam1:69/1428632365192/Put/vlen=2/seqid=143 V: 69 K: row-570/colfam1:70/1428632365200/Put/vlen=2/seqid=145 V: 70 K: row-571/colfam1:71/1428632365209/Put/vlen=2/seqid=147 V: 71 K: row-572/colfam1:72/1428632365217/Put/vlen=2/seqid=149 V: 72 K: row-573/colfam1:73/1428632365226/Put/vlen=2/seqid=151 V: 73 K: row-574/colfam1:74/1428632365237/Put/vlen=2/seqid=153 V: 74 K: row-575/colfam1:75/1428632365245/Put/vlen=2/seqid=155 V: 75 K: row-576/colfam1:76/1428632365253/Put/vlen=2/seqid=157 V: 76 K: row-577/colfam1:77/1428632365265/Put/vlen=2/seqid=159 V: 77 K: row-578/colfam1:78/1428632365279/Put/vlen=2/seqid=161 V: 78 K: row-579/colfam1:79/1428632365287/Put/vlen=2/seqid=163 V: 79 K: row-580/colfam1:80/1428632365294/Put/vlen=2/seqid=165 V: 80 K: row-581/colfam1:81/1428632365305/Put/vlen=2/seqid=167 V: 81 K: row-582/colfam1:82/1428632365314/Put/vlen=2/seqid=169 V: 82 K: row-583/colfam1:83/1428632365321/Put/vlen=2/seqid=171 V: 83 K: row-584/colfam1:84/1428632365343/Put/vlen=2/seqid=173 V: 84 K: row-585/colfam1:85/1428632365352/Put/vlen=2/seqid=175 V: 85 K: row-586/colfam1:86/1428632365375/Put/vlen=2/seqid=177 V: 86 K: row-587/colfam1:87/1428632365535/Put/vlen=2/seqid=179 V: 87 K: row-588/colfam1:88/1428632365560/Put/vlen=2/seqid=181 V: 88 K: row-589/colfam1:89/1428632365569/Put/vlen=2/seqid=183 V: 89 K: row-590/colfam1:90/1428632365582/Put/vlen=2/seqid=185 V: 90 K: row-591/colfam1:91/1428632365594/Put/vlen=2/seqid=187 V: 91 K: row-592/colfam1:92/1428632365620/Put/vlen=2/seqid=189 V: 92 K: row-593/colfam1:93/1428632365633/Put/vlen=2/seqid=191 V: 93 K: row-594/colfam1:94/1428632365642/Put/vlen=2/seqid=193 V: 94 K: row-595/colfam1:95/1428632365651/Put/vlen=2/seqid=195 V: 95 K: row-596/colfam1:96/1428632365671/Put/vlen=2/seqid=197 V: 96 K: row-597/colfam1:97/1428632365679/Put/vlen=2/seqid=199 V: 97 K: row-598/colfam1:98/1428632365684/Put/vlen=2/seqid=201 V: 98 K: row-599/colfam1:99/1428632365689/Put/vlen=2/seqid=203 V: 99 K: row-600/colfam1:00/1428632365694/Put/vlen=2/seqid=205 V: 00 K: row-601/colfam1:01/1428632365702/Put/vlen=2/seqid=207 V: 01 K: row-602/colfam1:02/1428632365709/Put/vlen=2/seqid=209 V: 02 K: row-603/colfam1:03/1428632365717/Put/vlen=2/seqid=211 V: 03 K: row-604/colfam1:04/1428632365722/Put/vlen=2/seqid=213 V: 04 K: row-605/colfam1:05/1428632365729/Put/vlen=2/seqid=215 V: 05 K: row-606/colfam1:06/1428632365752/Put/vlen=2/seqid=217 V: 06 K: row-607/colfam1:07/1428632365758/Put/vlen=2/seqid=219 V: 07 K: row-608/colfam1:08/1428632365765/Put/vlen=2/seqid=221 V: 08 K: row-609/colfam1:09/1428632365773/Put/vlen=2/seqid=223 V: 09 K: row-610/colfam1:10/1428632365778/Put/vlen=2/seqid=225 V: 10 K: row-611/colfam1:11/1428632365785/Put/vlen=2/seqid=227 V: 11 K: row-612/colfam1:12/1428632365791/Put/vlen=2/seqid=229 V: 12 K: row-613/colfam1:13/1428632365798/Put/vlen=2/seqid=231 V: 13 K: row-614/colfam1:14/1428632365803/Put/vlen=2/seqid=233 V: 14 K: row-615/colfam1:15/1428632365811/Put/vlen=2/seqid=235 V: 15 K: row-616/colfam1:16/1428632365820/Put/vlen=2/seqid=237 V: 16 K: row-617/colfam1:17/1428632365834/Put/vlen=2/seqid=239 V: 17 K: row-618/colfam1:18/1428632365840/Put/vlen=2/seqid=241 V: 18 K: row-619/colfam1:19/1428632365850/Put/vlen=2/seqid=243 V: 19 K: row-620/colfam1:20/1428632365856/Put/vlen=2/seqid=245 V: 20 K: row-621/colfam1:21/1428632365864/Put/vlen=2/seqid=247 V: 21 K: row-622/colfam1:22/1428632365874/Put/vlen=2/seqid=249 V: 22 K: row-623/colfam1:23/1428632365882/Put/vlen=2/seqid=251 V: 23 K: row-624/colfam1:24/1428632365896/Put/vlen=2/seqid=253 V: 24 K: row-625/colfam1:25/1428632365903/Put/vlen=2/seqid=255 V: 25 K: row-626/colfam1:26/1428632365908/Put/vlen=2/seqid=257 V: 26 K: row-627/colfam1:27/1428632365917/Put/vlen=2/seqid=259 V: 27 K: row-628/colfam1:28/1428632365928/Put/vlen=2/seqid=261 V: 28 K: row-629/colfam1:29/1428632365934/Put/vlen=2/seqid=263 V: 29 K: row-630/colfam1:30/1428632365940/Put/vlen=2/seqid=265 V: 30 K: row-631/colfam1:31/1428632365945/Put/vlen=2/seqid=267 V: 31 K: row-632/colfam1:32/1428632365957/Put/vlen=2/seqid=269 V: 32 K: row-633/colfam1:33/1428632365967/Put/vlen=2/seqid=271 V: 33 K: row-634/colfam1:34/1428632365982/Put/vlen=2/seqid=273 V: 34 K: row-635/colfam1:35/1428632365999/Put/vlen=2/seqid=275 V: 35 K: row-636/colfam1:36/1428632366004/Put/vlen=2/seqid=277 V: 36 K: row-637/colfam1:37/1428632366020/Put/vlen=2/seqid=279 V: 37 K: row-638/colfam1:38/1428632366031/Put/vlen=2/seqid=281 V: 38 K: row-639/colfam1:39/1428632366038/Put/vlen=2/seqid=283 V: 39 K: row-640/colfam1:40/1428632366048/Put/vlen=2/seqid=285 V: 40 K: row-641/colfam1:41/1428632366057/Put/vlen=2/seqid=287 V: 41 K: row-642/colfam1:42/1428632366240/Put/vlen=2/seqid=289 V: 42 K: row-643/colfam1:43/1428632366249/Put/vlen=2/seqid=291 V: 43 K: row-644/colfam1:44/1428632366256/Put/vlen=2/seqid=293 V: 44 K: row-645/colfam1:45/1428632366264/Put/vlen=2/seqid=295 V: 45 K: row-646/colfam1:46/1428632366270/Put/vlen=2/seqid=297 V: 46 K: row-647/colfam1:47/1428632366276/Put/vlen=2/seqid=299 V: 47 K: row-648/colfam1:48/1428632366284/Put/vlen=2/seqid=301 V: 48 K: row-649/colfam1:49/1428632366290/Put/vlen=2/seqid=303 V: 49 K: row-650/colfam1:50/1428632366300/Put/vlen=2/seqid=305 V: 50 K: row-651/colfam1:51/1428632366305/Put/vlen=2/seqid=307 V: 51 K: row-652/colfam1:52/1428632366313/Put/vlen=2/seqid=309 V: 52 K: row-653/colfam1:53/1428632366321/Put/vlen=2/seqid=311 V: 53 K: row-654/colfam1:54/1428632366330/Put/vlen=2/seqid=313 V: 54 K: row-655/colfam1:55/1428632366337/Put/vlen=2/seqid=315 V: 55 K: row-656/colfam1:56/1428632366343/Put/vlen=2/seqid=317 V: 56 K: row-657/colfam1:57/1428632366350/Put/vlen=2/seqid=319 V: 57 K: row-658/colfam1:58/1428632366363/Put/vlen=2/seqid=321 V: 58 K: row-659/colfam1:59/1428632366370/Put/vlen=2/seqid=323 V: 59 K: row-660/colfam1:60/1428632366384/Put/vlen=2/seqid=325 V: 60 K: row-661/colfam1:61/1428632366392/Put/vlen=2/seqid=327 V: 61 K: row-662/colfam1:62/1428632366397/Put/vlen=2/seqid=329 V: 62 K: row-663/colfam1:63/1428632366403/Put/vlen=2/seqid=331 V: 63 K: row-664/colfam1:64/1428632366410/Put/vlen=2/seqid=333 V: 64 K: row-665/colfam1:65/1428632366421/Put/vlen=2/seqid=335 V: 65 K: row-666/colfam1:66/1428632366430/Put/vlen=2/seqid=337 V: 66 K: row-667/colfam1:67/1428632366437/Put/vlen=2/seqid=339 V: 67 K: row-668/colfam1:68/1428632366444/Put/vlen=2/seqid=341 V: 68 K: row-669/colfam1:69/1428632366461/Put/vlen=2/seqid=343 V: 69 K: row-670/colfam1:70/1428632366477/Put/vlen=2/seqid=345 V: 70 K: row-671/colfam1:71/1428632366487/Put/vlen=2/seqid=347 V: 71 K: row-672/colfam1:72/1428632366498/Put/vlen=2/seqid=349 V: 72 K: row-673/colfam1:73/1428632366507/Put/vlen=2/seqid=351 V: 73 K: row-674/colfam1:74/1428632366520/Put/vlen=2/seqid=353 V: 74 K: row-675/colfam1:75/1428632366530/Put/vlen=2/seqid=355 V: 75 K: row-676/colfam1:76/1428632366542/Put/vlen=2/seqid=357 V: 76 K: row-677/colfam1:77/1428632366555/Put/vlen=2/seqid=359 V: 77 K: row-678/colfam1:78/1428632366578/Put/vlen=2/seqid=361 V: 78 K: row-679/colfam1:79/1428632366588/Put/vlen=2/seqid=363 V: 79 K: row-680/colfam1:80/1428632366596/Put/vlen=2/seqid=365 V: 80 K: row-681/colfam1:81/1428632366604/Put/vlen=2/seqid=367 V: 81 K: row-682/colfam1:82/1428632366617/Put/vlen=2/seqid=369 V: 82 K: row-683/colfam1:83/1428632366629/Put/vlen=2/seqid=371 V: 83 K: row-684/colfam1:84/1428632366640/Put/vlen=2/seqid=373 V: 84 K: row-685/colfam1:85/1428632366649/Put/vlen=2/seqid=375 V: 85 K: row-686/colfam1:86/1428632366658/Put/vlen=2/seqid=377 V: 86 K: row-687/colfam1:87/1428632366664/Put/vlen=2/seqid=379 V: 87 K: row-688/colfam1:88/1428632366673/Put/vlen=2/seqid=381 V: 88 K: row-689/colfam1:89/1428632366680/Put/vlen=2/seqid=383 V: 89 K: row-690/colfam1:90/1428632366686/Put/vlen=2/seqid=385 V: 90 K: row-691/colfam1:91/1428632366693/Put/vlen=2/seqid=387 V: 91 K: row-692/colfam1:92/1428632366701/Put/vlen=2/seqid=389 V: 92 K: row-693/colfam1:93/1428632366857/Put/vlen=2/seqid=391 V: 93 K: row-694/colfam1:94/1428632366868/Put/vlen=2/seqid=393 V: 94 K: row-695/colfam1:95/1428632366873/Put/vlen=2/seqid=395 V: 95 K: row-696/colfam1:96/1428632366881/Put/vlen=2/seqid=397 V: 96 K: row-697/colfam1:97/1428632366890/Put/vlen=2/seqid=399 V: 97 K: row-698/colfam1:98/1428632366896/Put/vlen=2/seqid=401 V: 98 K: row-699/colfam1:99/1428632366902/Put/vlen=2/seqid=403 V: 99 Block index size as per heapsize: 400 ///dumps the internal HFile.Reader properties reader=/hbase/data/default/table3/1fa2e49c7404d3cd39afc39a99cc1c26/colfam1/0f6fc234c3014b6e9d84d3cae065d1b4, compression=none, cacheConf=CacheConfig:disabled, firstKey=row-500/colfam1:00/1428632364152/Put, lastKey=row-699/colfam1:99/1428632366902/Put, avgKeyLen=28, avgValueLen=2, entries=200, length=13581 ///Trailer块信息 Trailer: fileinfoOffset=8857, loadOnOpenDataOffset=8742, dataIndexCount=1, metaIndexCount=0, totalUncomressedBytes=13483, entryCount=200, compressionCodec=NONE, uncompressedDataIndexSize=41, numDataIndexLevels=1, firstDataBlockOffset=0, lastDataBlockOffset=0, comparatorClassName=org.apache.hadoop.hbase.KeyValue$KeyComparator, encryptionKey=NONE, majorVersion=3, minorVersion=0 ///FileInfo块信息 Fileinfo: BLOOM_FILTER_TYPE = ROW DELETE_FAMILY_COUNT = \x00\x00\x00\x00\x00\x00\x00\x00 EARLIEST_PUT_TS = \x00\x00\x01L\xA1\x1F\xE4x KEY_VALUE_VERSION = \x00\x00\x00\x01 LAST_BLOOM_KEY = row-699 MAJOR_COMPACTION_KEY = \x00 MAX_MEMSTORE_TS_KEY = \x00\x00\x00\x00\x00\x00\x01\x93 MAX_SEQ_ID_KEY = 404 TIMERANGE = 1428632364152....1428632366902 hfile.AVG_KEY_LEN = 28 hfile.AVG_VALUE_LEN = 2 hfile.LASTKEY = \x00\x07row-699\x07colfam199\x00\x00\x01L\xA1\x1F\xEF6\x04 hfile.MAX_TAGS_LEN = \x00\x00\x00\x00 hfile.TAGS_COMPRESSED = \x00 Mid-key: \x00\x07row-500\x07colfam100\x00\x00\x01L\xA1\x1F\xE4x\x04 Bloom filter: BloomSize: 256 No of Keys in bloom: 200 Max Keys for bloom: 213 Percentage filled: 94% Number of chunks: 1 Comparator: RawBytesComparator Delete Family Bloom filter: Not present ///查询到的数据KV总数 Scanned kv count -> 200
2. KeyValue的Format
在HFile中,KeyValue是一个字节数组,由如下信息组成
3. HFile数据结构
3.1 Trailer数据块
Trailer是定长的,如图中所示,Trailer中有指针指向其他数据块的起始点,读取一个HFile时,会首先读取Trailer,然后DataBlock Index会被读取到内存中,这样当检索某个key时,不需要扫描整个HFile,而只需从内存中找到key所在的block,通过一次磁盘io将整个block读取到内存中,再找到需要的key。
3.2 File Info数据块
File Info数据块是定长的,记录了文件的一些Meta信息,例如:AVG_KEY_LEN, AVG_VALUE_LEN,LAST_KEY, COMPARATOR, MAX_SEQ_ID_KEY等。
3.3 Data Block
Data Block保存表中的数据,是HBase I/O的基本单元,为了提高效率,HRegionServer中有基于LRU的block cache机制。每个Data块的大小可以在创建一个table的时候通过参数指定,大号的block有利于顺序scan,小号block利于随机查询。
每个Data块除了开头的Magic以外就是一个个KeyValue对拼接而成,Magic内容就是一些随机数字,目的是防止数据损坏。每个块都有一个魔数
关于Data Block的块大小
Minimum block size. We recommend a setting of minimum block size between 8KB to 1MB for general usage. Larger block size is preferred if files are primarily for sequential access. However, it would lead to inefficient random access (because there are more data to decompress). Smaller blocks are good for random access, but require more memory to hold the block index, and may be slower to create (because we must flush the compressor stream at the conclusion of each data block, which leads to an FS I/O flush). Further, due to the internal caching in Compression codec, the smallest possible block size would be around 20KB-30KB.
3.4 Meta Block段(可选的):
保存用户自定义的KeyValue对,可以被压缩。
Data Block Index段:
Data Block的索引,每条索引的key是被索引的block的第一条记录的key。The index blocks record the offsets of the data and meta blocks
4. HFile与HDFS Block的关系
HFile的块大小默认是64k,而HDFS的块大小默认是64M,因此,HDFS的块大小是HFile的块大小的1024倍,下图展现了232M
HFile中的块存放到HDFS的块中
5. HFile Compact
数据写入流程: Client写入 -> 存入MemStore,一直到MemStore满 -> Flush成一个StoreFile,StoreFile数目直至增长到一定阈值 -> 触发Compact合并操作 -> 多个StoreFile合并成一个StoreFile,同时进行版本合并和数据删除 -> 当StoreFiles Compact后,逐步形成越来越大的StoreFile -> 单个StoreFile大小超过一定阈值后,触发Split操作,把当前Region Split成2个Region,Region会下线,新Split出的2个孩子Region会被HMaster分配到相应的HRegionServer上,使得原先1个Region的压力得以分流到2个Region上。由此过程可知,HBase只是增加数据,有所得更新和删除操作,都是在Compact阶段做的,所以,用户写操作只需要进入到内存即可立即返回,从而保证I/O高性能。
参考:http://blog.csdn.net/john_f_lau/article/details/18899311
相关推荐
《深入剖析HBase源代码:hbase-0.98.23》 HBase,作为Apache的一个开源项目,是构建在Hadoop之上的分布式、版本化、列族式的NoSQL数据库,它提供了高可靠性、高性能、可伸缩的数据存储解决方案。本文将基于hbase-...
Hadoop主要负责大规模数据的分布式存储和计算,而HBase则是构建在Hadoop之上的分布式列式数据库,适合实时查询和大数据分析。本主题将详细讲解如何将HDFS(Hadoop Distributed File System)上的文件导入到HBase中,...
此外,HBase还提供了批量操作的工具,如HBase的`BulkLoad`功能,它可以将预先格式化的数据文件直接加载到HFile中,进一步提升写入速度。这个过程通常包括数据预处理、生成SequenceFile、上传到HDFS以及执行`...
HBase使用HFile作为数据存储格式,这是一种二进制格式,支持高效的随机读取和压缩,降低了存储成本。 8. **过滤器和扫描器**: HBase提供了丰富的过滤器和扫描器机制,用户可以通过定义特定的过滤规则来筛选需要...
HFile是HBase中的存储格式文件,它可以被HBase识别和读取,类似于传统数据库中的数据文件。在转换过程中,Hadoop MapReduce程序用于处理文本文件中的数据,并生成HBase能识别的HFile格式。这一步骤需要将原始文本...
1. **列式存储**:与传统的关系型数据库不同,HBase以列族(Column Family)的形式存储数据,这种模式对于大数据分析非常高效,因为可以只扫描需要的列。 2. **分布式架构**:HBase通过Region Server将数据分布在...
4. HFile:HFile是HBase中存储的最底层数据文件格式,它存储在HDFS上。 5. HLog:HLog是HBase中用于记录所有对表的更新操作的日志文件,这些日志文件可以在系统出现故障时用于数据的恢复。 在使用HBase时,Rowkey...
10. **Compaction**: HBase会定期进行Major Compaction,合并Region内的HFile,减少文件数量并清理过期数据,优化存储空间。 11. ** Coprocessors**: HBase引入Coprocessors机制,允许在RegionServer上执行自定义...
- **存储引擎**:HBase使用一种称为HFile的文件格式来存储数据。这些文件存储在HDFS上,并支持高效的随机读取和顺序写入。 - **缓存机制**:HBase利用内存缓存来加速读取操作,包括MemStore(内存中存储未刷新到磁盘...
3. **HBase的Compaction**:Compaction是HBase中用于合并HFile的过程,目的是减少Region内的文件数量,提高读取效率。有两种类型的Compaction:Minor和Major,前者只合并同一Region的小文件,后者会合并所有文件,...
5. 存储文件(HFile):HBase使用Hadoop的HDFS作为底层存储,数据以HFile格式存储,这是一种高效的二进制文件格式。 6. Region服务器(Region Server):HBase将表分成多个Region,每个Region由一个Region服务器...
5. `MemStore`和`HFile`:内存数据结构和磁盘存储格式,构成了HBase的数据层。 总的来说,HBase 0.94.13的jar包和源码提供了全面了解和使用HBase的工具。通过分析源码,开发者可以深入理解分布式数据库的设计思想,...
- `hbase hbck2 -details`: 这个命令将执行一个全面的检查,显示每个Region的状态、存储文件、 RegionServer上的位置等详细信息。 2. **修复不一致性**: - `hbase hbck2 -fixAssignments`: 如果发现Region未正确...
- **HFile**:HFile 是 HBase 中的数据文件格式,用于存储已经刷盘的数据。 #### 三、HBase 安装与配置 **3.1 安装前准备** - **安装 Java**:HBase 要求运行在 Java 环境下,因此首先需要安装 Java。 - **安装 ...
《HBase大数据》压缩包包含了对HBase这一分布式大数据存储系统的深入探讨,是学习和理解HBase的重要资源。HBase,作为Apache软件基金会的一个顶级项目,是构建在Hadoop文件系统(HDFS)之上的非关系型数据库,专门...
而HBase是构建在Hadoop文件系统(HDFS)之上,提供高可靠、高性能、列式存储、支持多版本、实时读写的分布式数据库,适用于大数据实时查询场景。 Java在大数据生态中扮演着连接不同组件的重要角色,它提供了丰富的...
HBase是Apache软件基金会的一个开源项目,它是一个分布式的、可扩展的、非关系型数据库(NoSQL),主要用来处理大量稀疏数据的存储和检索。HBase通常部署在Hadoop集群上,使用Hadoop的HDFS作为其底层存储系统,并且...
时间戳使得HBase可以存储同一列下的多个版本的数据,这对于审计或历史数据分析非常有用。 HBase构建在Hadoop之上,利用HDFS作为其底层存储系统,因此具有高可用性和容错性。它通过ZooKeeper进行元数据管理和协调,...
HBase的数据在物理存储上采用HFile格式,这是一种优化的二进制文件格式,适用于随机读取和批量写入操作。HFile将数据按行键排序,提供高效的查找性能,并支持数据压缩,从而节省存储空间。 除了基本功能,HBase还与...
Cassandra与HBase系统架构比对 Cassandra与HBase是两种常用的NoSQL数据库管理系统,它们之间有着许多相似之处,但同时也存在着一些关键的...Cassandra更适合大规模数据存储和处理,而HBase更适合实时数据处理和分析。