概述:
此分享包括:MapReduce研究现状和毕玄-HBase简介与实践分享的汇总
汇总点:
Table in HBase以Region为单位管理region(startKey,endKey); Hbase每个Column Family单独存储:storeFile; Hbase当某个Column Family累积的大小 > 某阈值时,自动分裂成两个Region通过查找-ROOT- & .META.来获取某行在某region上; RegionServer:Region读写操作的场所; Master:管理Region的分配和基于zookeeper来保证HA; Hbase的强一致性:同一行数据的读写只在同一台regionserver上进行; Hbase的水平伸缩表现在:region的自动分裂以及master的balance、只用增加datanode机器即可增加容量和增加regionserver机器即可增加读写吞吐量; Hbase行事务同一行的列的写入是原子的; Hbase合理设计rowKey & Pre-Sharding; Hbase开启压缩create table ‘t1’,{NAME => ‘cf1’, COMPRESSION => ‘lzo’} Hadoop集群监控工具Ganglia;
Hadoop调优点:
I/O: io.sort.mb io.sort.percent io.sort.record.percent io.sort.spill.percent Shuffle: tasktracker.http.threads mapred.reduce.parallel.copies mapred.job.shuffle.input.buffer.percent 其他: 数据压缩 推测性执行(同时执行同一Task,杀死运行慢的) 同一节点的Child重用jvm 重写Partitioner,使分布到各Reducer的数据均匀 设置堆空间大小
写速度关键因素:
Table region分布均衡; 单台region server的region数; hbase.regionserver.handler.count hbase.regionserver.global.memstore.upperLimit hbase.hregion.memstore.block.multiplier hbase.hstore.blockingStoreFiles hbase.hregion.max.filesize
读速度关键因素:
单台Region Server上的Region数; StoreFile数; bloomfilter; in-memory flag; blockcache设置; hfile.block.cache.size;
更多详情参见附件
相关推荐
这篇分享总结涵盖了MapReduce的研究现状以及毕玄对于HBase的实践介绍,这两部分的内容对于理解大数据处理的基础设施至关重要。 首先,让我们来深入探讨MapReduce。MapReduce是由Google提出的一种分布式计算模型,它...
Hadoop 3.x(MapReduce)----【MapReduce 概述】---- 代码 Hadoop 3.x(MapReduce)----【MapReduce 概述】---- 代码 Hadoop 3.x(MapReduce)----【MapReduce 概述】---- 代码 Hadoop 3.x(MapReduce)----...
赠送jar包:hadoop-mapreduce-client-jobclient-2.6.5.jar; 赠送原API文档:hadoop-mapreduce-client-jobclient-2.6.5-javadoc.jar; 赠送源代码:hadoop-mapreduce-client-jobclient-2.6.5-sources.jar; 赠送...
赠送jar包:hadoop-mapreduce-client-app-2.7.3.jar; 赠送原API文档:hadoop-mapreduce-client-app-2.7.3-javadoc.jar; 赠送源代码:hadoop-mapreduce-client-app-2.7.3-sources.jar; 赠送Maven依赖信息文件:...
Hadoop 3.x(MapReduce)----【Hadoop 序列化】---- 代码 Hadoop 3.x(MapReduce)----【Hadoop 序列化】---- 代码 Hadoop 3.x(MapReduce)----【Hadoop 序列化】---- 代码 Hadoop 3.x(MapReduce)----【Hadoop ...
赠送jar包:hadoop-mapreduce-client-jobclient-2.6.5.jar; 赠送原API文档:hadoop-mapreduce-client-jobclient-2.6.5-javadoc.jar; 赠送源代码:hadoop-mapreduce-client-jobclient-2.6.5-sources.jar; 赠送...
赠送jar包:hadoop-mapreduce-client-app-2.6.5.jar; 赠送原API文档:hadoop-mapreduce-client-app-2.6.5-javadoc.jar; 赠送源代码:hadoop-mapreduce-client-app-2.6.5-sources.jar; 赠送Maven依赖信息文件:...
赠送jar包:hadoop-mapreduce-client-core-2.7.3.jar; 赠送原API文档:hadoop-mapreduce-client-core-2.7.3-javadoc.jar; 赠送源代码:hadoop-mapreduce-client-core-2.7.3-sources.jar; 赠送Maven依赖信息文件:...
赠送jar包:hadoop-mapreduce-client-core-2.5.1.jar; 赠送原API文档:hadoop-mapreduce-client-core-2.5.1-javadoc.jar; 赠送源代码:hadoop-mapreduce-client-core-2.5.1-sources.jar; 赠送Maven依赖信息文件:...
赠送jar包:hadoop-mapreduce-client-core-2.6.5.jar 赠送原API文档:hadoop-mapreduce-client-core-2.6.5-javadoc.jar 赠送源代码:hadoop-mapreduce-client-core-2.6.5-sources.jar 包含翻译后的API文档:...
赠送jar包:hadoop-mapreduce-client-app-2.6.5.jar; 赠送原API文档:hadoop-mapreduce-client-app-2.6.5-javadoc.jar; 赠送源代码:hadoop-mapreduce-client-app-2.6.5-sources.jar; 赠送Maven依赖信息文件:...
赠送jar包:hadoop-mapreduce-client-common-2.7.3.jar; 赠送原API文档:hadoop-mapreduce-client-common-2.7.3-javadoc.jar; 赠送源代码:hadoop-mapreduce-client-common-2.7.3-sources.jar; 赠送Maven依赖信息...
- HBase与Hadoop的MapReduce框架紧密集成,支持大规模数据处理和分析。 8. **HBase shell** - 提供命令行接口(CLI)工具,方便用户执行CRUD(创建、读取、更新、删除)操作和管理表。 9. **表设计最佳实践** -...
赠送jar包:hadoop-mapreduce-client-jobclient-2.5.1.jar; 赠送原API文档:hadoop-mapreduce-client-jobclient-2.5.1-javadoc.jar; 赠送源代码:hadoop-mapreduce-client-jobclient-2.5.1-sources.jar; 赠送...
赠送jar包:hadoop-mapreduce-client-shuffle-2.5.1.jar; 赠送原API文档:hadoop-mapreduce-client-shuffle-2.5.1-javadoc.jar; 赠送源代码:hadoop-mapreduce-client-shuffle-2.5.1-sources.jar; 赠送Maven依赖...
赠送jar包:hadoop-mapreduce-client-common-2.6.5.jar; 赠送原API文档:hadoop-mapreduce-client-common-2.6.5-javadoc.jar; 赠送源代码:hadoop-mapreduce-client-common-2.6.5-sources.jar; 赠送Maven依赖信息...
赠送jar包:hadoop-mapreduce-client-jobclient-2.7.3.jar; 赠送原API文档:hadoop-mapreduce-client-jobclient-2.7.3-javadoc.jar; 赠送源代码:hadoop-mapreduce-client-jobclient-2.7.3-sources.jar; 赠送...
赠送jar包:hadoop-mapreduce-client-jobclient-2.5.1.jar; 赠送原API文档:hadoop-mapreduce-client-jobclient-2.5.1-javadoc.jar; 赠送源代码:hadoop-mapreduce-client-jobclient-2.5.1-sources.jar; 赠送...
赠送jar包:hadoop-mapreduce-client-shuffle-2.6.5.jar; 赠送原API文档:hadoop-mapreduce-client-shuffle-2.6.5-javadoc.jar; 赠送源代码:hadoop-mapreduce-client-shuffle-2.6.5-sources.jar; 赠送Maven依赖...
赠送jar包:hadoop-mapreduce-client-common-2.5.1.jar; 赠送原API文档:hadoop-mapreduce-client-common-2.5.1-javadoc.jar; 赠送源代码:hadoop-mapreduce-client-common-2.5.1-sources.jar; 赠送Maven依赖信息...