５。hbase高级部分:table design schema - 莱布尼兹 - ITeye博客

`

leibnitz

浏览: 289019 次
性别:
来自: 广州

最近访客更多访客>>

eternal1025

bneliao

adapterofcoms

caipeijun666

博主相关

博客

微博

相册

收藏

留言

关于我

文章分类

社区版块

存档分类

最新评论

jpsb： ...
为什么需要分布式？
leibnitz： hi guy, this is used as develo ...
compile hadoop-2.5.x on OS X(macbook)
string2020：撸主真土豪,在苹果里面玩大数据.
compile hadoop-2.5.x on OS X(macbook)
youngliu_liu：怎样运行这个脚本啊？？大牛，我刚进入搜索引擎行业，希望你能不吝 ...
nutch 数据增量更新
leibnitz： also, there is a similar bug ...
２。hbase CRUD--Lease in hbase

５。hbase高级部分:table design schema

博客分类：

hbase-0.94.2 source

阅读更多

study and summarie below

art 1:Table attributes

attr	default	usage/principle	use case	note
Bloom filter	disable	cost some mem to impove lookup time TBD	do huge range scan table	this attr contains 'row','row-col',or none
Column families				a printable string since this will be used as the dir name under region-name
Maximum file size	10G in 94.2			maxStoreSize in fact;i.e. property "hbase.hregion.max.filesize" set in hbase-site.xml
Read-only	false		like a firmware to keep safe .i.e. a 'dead' table that never changed
Memstore flush size	128m in 94.2	same effect with property in xml 'hbase.hregion.memstore.flush.size'		1.this value determine the frequency of generating store file 2.as 1,this effects the replay time of hlog when a rs down.
Deferred log flush	false	if false,use 'hbase.regionserver.optionallogflushinterval' to check period to sumit edits		if true may cause data loss as these cached data are in memory before sync to fs

Part 2:Column Family attributes

attr	default	usage/principle	use case	note
In-memory	false	cache some blocks of a small family in mem to speed up query	analogous to secondarny index table ,for small table	not guanrantee to when or how much blocks being cached
Bloom filter				see Part 1
Replication scope	0(disable)	sync local cluster data with remote ones TBD	for load balance by distribute req to clusters?
Maximum versions	3	control that how many versions(changes)are kept in storage		use 1 in general.if u want to check last verion only,given '2' is a good idea. this will interact with 'Time-to-live'
Compression	none	compress this family if specified SNAPPY,LZO,GZ..		u must be clear completely what your requirements are then use corresponding one
Block size	64k	a store file is splited into certain blocks,so smaller block cause faster reading randomly;else use bigger if for sequential readings TBD
Block cache	true	when read some rows from hbase,this dertermine whehter to write back to cache to speed up last access	use 'true' if clients used access to the much duplicted rows ;'false' if do a whole table scan or less readings than writes system
Time-to-live	max.int(sec in unit)	how along a cell value will be kept in storage	if this is a 'recycled' system(ie. rolling),use a appropriate value to keep data size	this will interact with 'Maximum versions',that is both attributes contorl the data verions overlying by this

Ref:

hbase definitive book

分享到：

downgrade hbase from 0.94.16 to 0.94.2 | ５。hbase高级部分 - Coprocessor (vs Fil ...

2013-10-16 00:04
浏览 813
评论(0)
论坛回复 / 浏览 (0 / 1263)
分类:开源软件
查看更多

评论

发表评论

您还没有登录,请您登录后再发表评论

相关推荐

HBase学习利器：HBase实战: 总之，《HBase in Action》这本书全面覆盖了HBase的基础知识和高级应用技巧，对于想要深入了解HBase并将其应用于实际项目中的开发者来说，是非常宝贵的资源。通过本书的学习，读者不仅可以掌握HBase的基本操作，还能...

HBase 数据集：ORDER_INFO: HBase 数据集：ORDER_INFO

hbase的shell操作: 本文将基于提供的描述和部分代码示例，深入讲解HBase Shell的操作方法。 ### 创建表在HBase中，表由行键（Row Key）、列族（Column Family）和列限定符（Column Qualifier）组成。通过HBase Shell可以创建带有...

hbase-common-1.4.3-API文档-中文版.zip: 赠送jar包：hbase-common-1.4.3.jar；赠送原API文档：hbase-common-1.4.3-javadoc.jar；赠送源代码：hbase-common-1.4.3-sources.jar；赠送Maven依赖信息文件：hbase-common-1.4.3.pom；包含翻译后的API文档：...

hbase-annotations-1.1.2-API文档-中文版.zip: 赠送jar包：hbase-annotations-1.1.2.jar；赠送原API文档：hbase-annotations-1.1.2-javadoc.jar；赠送源代码：hbase-annotations-1.1.2-sources.jar；赠送Maven依赖信息文件：hbase-annotations-1.1.2.pom； ...

HBase.Design.Patterns: Title: HBase Design Patterns Author: Mark Kerzner, ...Chapter 5: Time Series Data Chapter 6: Denormalization Use Cases Chapter 7: Advanced Patterns for Data Modeling Chapter 8: Performance Optimization

基于HBase的HydraQL：简化HBase查询操作的Java设计源码: 最后，"hydraql-common"目录可能包含了一些通用的工具类或模块，这些是项目中其他部分依赖的基础组件。通过这样的设计，HydraQL不仅降低了HBase的使用难度，还拓展了HBase的应用场景。它使得那些习惯于使用SQL的...

HBase基本操作 Java代码: HBase基本操作增删改查 java代码要使用须导入对应的jar包

HBase in Practise: 性能、监控和问题排查: HBase在不同版本（1.x, 2.x, 3.0）中针对不同类型的硬件（以IO为例，HDD/SATA-SSD/PCIe-SSD/Cloud）和场景（single/batch, get/scan）做了（即将做）各种不同的优化，这些优化都有哪些？如何针对自己的生产业务和...

hbase-page:hbase 分页: 在Java中，我们可以使用HBase的Admin和Table接口创建和管理扫描器。以下是一个简单的示例： ```java Configuration config = HBaseConfiguration.create(); Connection connection = ConnectionFactory.create...

HBase全攻略：从安装配置到实战操作详解: 内容概要：本文档是一份详尽的HBase学习教程，涵盖从安装配置、基础操作到实战项目的全方位内容。首先介绍了HBase的基本概念和特点，接着详细讲解了HBase的安装与配置步骤，包括环境准备、下载与解压、配置文件修改...

Hbase权威指南(HBase: The Definitive Guide): - **高级配置**：对于已经熟悉HBase基础操作的读者，书中还介绍了如何根据具体需求调整配置参数，以获得更佳的性能表现。 #### 七、总结通过上述内容可以看出，《HBase权威指南》全面而深入地介绍了HBase的相关...

hbase-2.4.16-bin.tar.gz: hbase官网下载地址（官网下载太慢）： https://downloads.apache.org/hbase/ 国内镜像hbase-2.4.16： https://mirrors.tuna.tsinghua.edu.cn/apache/hbase/2.4.16/hbase-2.4.16-bin.tar.gz

hbase-exporter:HBase Prometheus导出器: hbase-exporterHBase Prometheus导出器收集指标并中继JMX指标以供Prometheus使用由于JMX中一些重要的指标缺失或为空，因此我们另外分析了HBase主界面，例如“过渡中的过时区域” 解析“ hbase hbck”命令的输出以...

Apache HBase Primer: Chapter 5: Column Family and Column Qualiﬁ er Chapter 6: Row Versioning Chapter 7: Logical Storage Part III: Architecture Chapter 8: Major Components of a Cluster Chapter 9: Regions ...

Hbase SYSTEM.STATS磁盘爆满处理方法.docx: 在IT行业中，尤其是在大数据存储和处理领域，HBase和Phoenix是非常重要的组件。HBase是一个分布式的、面向列的NoSQL数据库，它构建于Hadoop之上，适用于大规模数据存储。而Phoenix是一个高性能的关系型SQL层，它允许...

HBASE schema design: HBase的一个重要特点就是它的schema设计，它使用行键来组织数据，以及列族（column family）的概念来管理列数据。首先，了解HBase的架构是理解其schema设计的前提。HBase表由多个区域（regions）组成，每个区域由...

HBase的使用：包括HBase的解压、配置文件、服务的启动、查看HBabe页面、HBabe Shell操作等等: 在`hbase-site.xml`中，需要配置HBase的根目录（`hbase.rootdir`），分布模式（`hbase.cluster.distributed`），Master服务器的端口（`hbase.master.port`），ZooKeeper的群集地址（`hbase.zookeeper.quorum`）以及...

HBase安装与配置资源下载：hbase-1.2.6: HBase是一款基于Google Bigtable设计思想的开源分布式数据库，它属于Apache Hadoop生态系统的一部分，专为处理大规模数据而设计。HBase提供了实时读写、高可靠性和水平扩展的能力，是大数据存储的重要工具。 HBase...

Global site tag (gtag.js) - Google Analytics