Hadoop之HBase学习笔记

290434409

浏览: 27289 次
性别:
来自: 北京

最近访客更多访客>>

眉眼间的绝美

gzb001

逆风翔

渗透-龅牙丑娃

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

hbase

hbase hadoop HBase Hbase

本文主要是一些具体的java代码以及少量的概念描述，至于具体的概念可以详细百度

1、namespace

HBase namespace特性是对表资源进行隔离的一种技术，隔离技术决定了HBase能否实现资源统一化管理的关键，提高了整体的安全性。

*配额管理：限制一个namespace使用的资源，如：region、table
*命名空间安全管理：提供了多租户安全管理
*Region服务器组：保证了数据隔离性，可以将一个namespace或者table固定在一个regionServer上

1.1 预定义的namespace
*hbase 系统命名空间，包含hbase的内部表
*default 默认命名空间

1.2 namespace的DDL操作

1.2.1 创建：create_namespace 'zhw

hbase(main):004:0> create_namespace 'zhw'
0 row(s) in 0.1210 seconds

1.2.2 删除：drop_namespace 'zhw'

hbase(main):006:0> drop_namespace 'zhw'
0 row(s) in 0.1000 seconds

1.2.3 查看：list_namespace

hbase(main):009:0> list_namespace
NAMESPACE
default
hbase
zhw
3 row(s) in 0.0770 seconds

1.2.4 权限：

grant <user> <permissions>[ <table>[ <column family>[ <column qualifier> ] ] ]
revoke <user> <permissions> [ <table> [ <column family> [ <column qualifier> ] ] ]
user_permission <table>

*注意*权限需要进行启用：hbase-site.xml

<property>
       <name>hbase.security.authorization</name>
       <value>true</value>
</property>
<property>
      <name>hbase.coprocessor.master.classes</name>
      <value>
             org.apache.hadoop.hbase.security.access.AccessController
      </value>
</property>
<property>
       <name>hbase.coprocessor.region.classes</name>
       <value>
            org.apache.hadoop.hbase.security.token.TokenProvider,org.apache.hadoop.hbase.security.access.AccessController
      </value>
</property>

2、具体代码：

2.1 全局配置（此文代码均基于这个配置的），由于使用Zookeeper，所以只需要两个参数就ok：

Configuration config=new Configuration();
config.set("hbase.zookeeper.quorum", "10.8.177.204");
config.set("hbase.zookeeper.property.clientPort", "2181");
HBaseAdmin admin = new HBaseAdmin(config);

2.2创建表

HTableDescriptor tDesc=new HTableDescriptor(TableName.valueOf(tableName));
HColumnDescriptor cDesc=new HColumnDescriptor(family);
tDesc.addFamily(cDesc);
admin.createTable(tabDesc);
admin.close();

常用：

    设置region的store文件最大值：tDesc.setMaxFileSize(512);默认256M
    设置region内存中的最大值：tDesc.setMemStoreFlushSize(512)默认64M
    设置列族的数据保存时长：cDesc.setTimeToLive(5184000);单位秒
    设置列族数据保存再内存中：cDsc.setInMemory(true);可以提高响应速度
    设置列族数据保存的版本：setMaxVersions(10)  setMinVersions(5)
    WAL日志级别：枚举类Durability
        HTableDescriptor|Delete|Put对象.setDurability(Durability.FSYNC_WAL );//安全性高，影响性能
        Durability.USE_DEFAULT：use HBase's global default value (SYNC_WAL)

2.3 删除表

admin.disableTable(tableName);
admin.deleteTable(tableName);

2.4 修改表

        admin.disableTable(table.getTableName());
        for(String rmFam:removeFamilies){
            table.removeFamily(Bytes.toBytes(rmFam));
            System.err.println(" - deleted family " + rmFam);
        }
        for(HColumnDescriptor family:addCols){
            table.addFamily(family);
            System.err.println(" - added family " + family.getNameAsString());
        }
        admin.modifyTable(table.getTableName(),table);
        admin.enableTable(table.getTableName());

2.5 插入数据

public static void insert(String family,String[] qualifiers,String[] values,Put put){
        for(int i=0;i<qualifiers.length;i++){
            put.add(Bytes.toBytes(family),Bytes.toBytes(qualifiers[i]),Bytes.toBytes(values[i]));
        }
}
        //一个Put为一条记录
        Put p1=new Put(Bytes.toBytes(sid++));
            insert("name",new String[]{"firstName","lastName"},new String[]{"z","hw"},p1);
            insert("age",new String[]{"chinaAge","otherAge"},new String[]{"23","24"},p1);
            insert("sex",new String[]{"sex"},new String[]{"man"},p1);
            Put p2=new Put(Bytes.toBytes(sid++));
            insert("name",new String[]{"firstName","lastName"},new String[]{"zh","jy"},p2);
            insert("age",new String[]{"chinaAge","otherAge"},new String[]{"22","23"},p2);
            insert("sex",new String[]{"sex"},new String[]{"female"},p2);
           //......
        System.out.println("- ready insert ,count:" + puts.size());
        HTable table=new HTable(config,tableName);
        table.put(puts);
        table.close();
        System.out.println(" - insert success");

3、查询数据

3.1 按RowKey查询，核心类：Get

Get get=new Get(Bytes.toBytes(id));指定RowKey
get.addFamily(Bytes.toBytes(family));//指定列族 可选
get.addColumn(Bytes.toBytes(family),Bytes.toBytes(qualifier));//指定列
get.setTimeStamp(1444810207364L);//指定时间戳
get.setMaxVersions() ;//获取所有版本

取值:

Result Cell CellUtil  //取值处理
Bytes.toString(CellUtil.cloneFamily(cell)) //

3.2 全表扫描，核心类：Scan

Scan scan=new Scan();
scan.addFamily(Bytes.toBytes(family));
scan.addColumn(Bytes.toBytes(family),Bytes.toBytes(col));

取值：

ResultScanner rs=table.getScanner(scan);//后续上同

4、过滤器

4.1 过滤器比较器

RegexStringComparator
SubstringComparator
BinaryComparator
BinaryPrefixComparator
NullComparator
BitComparator

4.2 常用过滤器

SingleColumnValueFilter
FamilyFilter
QualifierFilter
RowFilter
PageFilter
... ...

4.3 过滤器包装：

SkipFilter//类似于对过滤器的非操作
FilterList//可以进行过滤器的【与|或】操作

5、异常：

代码中出现了Retrying connec to server...

多半是HBase的某个HRegionServer有问题了.

6、以上代码亲测全部通过，具体代码见附件.

7、参考资料

http://blog.csdn.net/opensure/article/details/46470969
http://blog.csdn.net/wulantian/article/details/41011297
http://www.cloudera.com/content/cloudera/en/documentation/core/v5-2-x/topics/cdh_sg_hbase_authorization.html
http://blog.csdn.net/u010967382/article/details/37653177
http://hbase.apache.org/0.94/book.html