`
sillycat
  • 浏览: 2560590 次
  • 性别: Icon_minigender_1
  • 来自: 成都
社区版块
存档分类
最新评论

HBase(1)Introduction and Installation

 
阅读更多

HBase(1)Introduction and Installation

1. HBase Introduction
Hadoop Database  ——> Hadoop HDFS
Hadoop Database ——>Hadoop MapReduce
Hadoop Database ——> Hadoop Zookeeper

Fundamentally Distributed — partitioning(sharding), replication
Column Oriented
Sequential Write     in memory——flush to disk
Merged Read
Periodic Data Compation

Pig(Data Flow) Hive(SQL), Sqoop(RDBMS importing support)

HMaster Server: Region assignment Mgmt(Hadoop Master,NameNode,JobTracker)

HRegionServer #1:DateNode, TaskTracker


2. Install and Setup Hadoop
Install protoc
>wgethttps://protobuf.googlecode.com/files/protobuf-2.5.0.tar.gz
Unzip and cd to that directory
>./configure --prefix=/Users/carl/tool/protobuf-2.5.0
>make
>make install
>sudo ln -s /Users/carl/tool/protobuf-2.5.0 /opt/protobuf-2.5.0
>sudo ln -s /opt/protobuf-2.5.0 /opt/protobuf

Add this line to my environment
export PATH=/opt/protobuf/bin:$PATH

Check the Installation Environment
>protoc --version
libprotoc 2.5.0

Compile Hadoop
>svn checkouthttp://svn.apache.org/repos/asf/hadoop/common/tags/release-2.4.0hadoop-common-2.4.0

Read the document here for building
http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.4.0/BUILDING.txt
>cd hadoop-common-2.4.0/
>mvn clean install -DskipTests
>cd hadoop-mapreduce-project
>mvn clean install assembly:assembly -Pnative

Maybe my machine is too slow, so a lot of timeout Error on my machine. So I redo it like this
>mvn clean -DskipTests install assembly:assembly -Pnative

Need to get rid of the native
>mvn clean -DskipTests install assembly:assembly

Not working, read the document INSTALL
http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.4.0/hadoop-mapreduce-project/INSTALL
>cd ..
>mvn clean package -Pdist -Dtar -DskipTests

Lastest document from here
/Users/carl/data/installation/hadoop-2.4.0/share/doc/hadoop/hadoop-project-dist/hadoop-common/SingleCluster.html 

Follow the BLOG and set hadoop on 4 machines
http://sillycat.iteye.com/blog/2084169
http://sillycat.iteye.com/blog/2090186

>sbin/start-dfs.sh
>sbin/start-yarn.sh
>sbin/mr-jobhistory-daemon.sh start historyserver

3. Setup Zookeeper
Follow the BLOG and set zookeeper on 3 machines
http://sillycat.iteye.com/blog/2015175

>zkServer.sh start conf/zoo-cluster.cfg

4. Try install HBase - Standalone HBase

Download this version since I am using hadoop 2.4.x
>wget http://mirrors.gigenet.com/apache/hbase/hbase-0.98.4/hbase-0.98.4-hadoop2-bin.tar.gz
Unzip the file and move it to the work directory.

>sudo ln -s /home/carl/tool/hbase-0.98.4 /opt/hbase-0.98.4
>sudo ln -s /opt/hbase-0.98.4 /opt/hbase

Check and modify the configuration file
>cat conf/hbase-site.xml 
<configuration>      

  <property>          

   <name>hbase.rootdir</name>          

   <value>file:///opt/hbase</value>      

  </property>      

  <property>          

    <name>hbase.zookeeper.property.dataDir</name>          

    <value>/home/carl/etc/hbase</value>      

  </property>

</configuration>

Start the Service
>bin/start-hbase.sh 

>jps
2036 NameNode 4084 Jps 3340 HMaster 2403 ResourceManager 2263 SecondaryNameNode 2686 JobHistoryServer

Enter the Client Shell
>bin/hbase shell

Create the table
>create 'test', 'cf'

Check the info on that table
>list 'test'

Inserting data
>put 'test', 'row1', 'cf:a', 'value1'
>put 'test', 'row2', 'cf:a', 'value2'
>put 'test', 'row3', 'cf:a', 'value3'
row1 should be the row key, column will be cf:a, value should be value1.

Get all the data
>scan 'test'
ROW                                         COLUMN+CELL                                                                        row1                                       column=cf:a, timestamp=1407169545627, value=value1                                                                           row2                                       column=cf:a, timestamp=1407169557668, value=value2                                                                           row3                                       column=cf:a, timestamp=1407169563458, value=value3                                                                          3 row(s) in 0.0630 seconds

Get a single row
>get 'test', 'row1'
COLUMN                                      CELL                                                                                                                         cf:a                                       timestamp=1407169545627, value=value1   

Some other command
>disable ‘test’
>enable ‘test’
>drop ‘test'

5. Pseudo-Distributed Local Install
Change the configuration as follow
<configuration>      

  <property>          

    <name>hbase.rootdir</name>          

    <value>hdfs://ubuntu-master:9000/hbase</value>      

  </property>      

  <property>          

    <name>hbase.zookeeper.property.dataDir</name>          

    <value>/home/carl/etc/hbase</value>      

  </property>      

  <property>          

    <name>hbase.cluster.distributed</name>          

    <value>true</value>      

  </property>      

  <property>          

    <name>hbase.master.wait.on.regionservers.mintostart</name>          

    <value>1</value>      

  </property>

</configuration>

List the HDFS directory
>hadoop fs -ls /
Found 4 items drwxr-xr-x   - carl supergroup          0 2014-07-09 13:22 /data drwxr-xr-x   - carl supergroup          0 2014-08-04 11:47 /hbase drwxr-xr-x   - carl supergroup          0 2014-07-10 13:09 /output drwxrwx---   - carl supergroup          0 2014-08-04 11:21 /tmp

>hadoop fs -ls /hbase
Found 6 items drwxr-xr-x   - carl supergroup          0 2014-08-04 11:48 /hbase/.tmp drwxr-xr-x   - carl supergroup          0 2014-08-04 11:47 /hbase/WALs drwxr-xr-x   - carl supergroup          0 2014-08-04 11:48 /hbase/data -rw-r--r--   3 carl supergroup         42 2014-08-04 11:47 /hbase/hbase.id -rw-r--r--   3 carl supergroup          7 2014-08-04 11:47 /hbase/hbase.version drwxr-xr-x   - carl supergroup          0 2014-08-04 11:47 /hbase/oldWALs

Start HMaster Backup Servers
The default port number for HMaster is 16010, 16020, 16030
>bin/local-master-backup.sh start 2 3 5 

That will start 3 HMaster backup Server on 16012/16022/16032,16013/16023/16033, 16015/16025/16035

Find the process id in /tmp/hbase-USERS-x-master.pid to stop the server

For example
>cat /tmp/hbase-carl-2-master.pid 
6442
>cat /tmp/hbase-carl-5-master.pid |xargs kill -9

Start and stop Additional RegionServers
The default port is 16020,16030. But the base additional ports are 16200, 16300.
>bin/local-regionservers.sh start 2 3 5

>bin/local-regionservers.sh stop 5

6. Fully Distributed
I have the 4 machines, I will list them as follow:
ubuntu-master     hmaster
ubuntu-client1      hmaster-backup
ubuntu-client2      regionserver
ubuntu-client3      regionserver

Set up the Configuration
>cat conf/regionservers 
ubuntu-client2 ubuntu-client3

>cat conf/backup-masters 
ubuntu-client1

Since I already have the ZK running, so 
>vi conf/hbase-env.sh
export HBASE_MANAGES_ZK=false

The main configuration file
>cat conf/hbase-site.xml 
<configuration>      

  <property>          

    <name>hbase.rootdir</name>          

    <value>hdfs://ubuntu-master:9000/hbase</value>      

  </property>      

  <property>          

    <name>hbase.zookeeper.property.dataDir</name>          

    <value>/home/carl/etc/hbase</value>      

  </property>      

  <property>          

    <name>hbase.cluster.distributed</name>          

    <value>true</value>      

  </property>      

  <property>          

    <name>hbase.master.wait.on.regionservers.mintostart</name>          

    <value>1</value>      

  </property>      

  <property>          

    <name>hbase.zookeeper.quorum</name>          

    <value>ubuntu-client1,ubuntu-client2,ubuntu-client3</value>      

  </property>      

  <property>          

    <name>hbase.zookeeper.property.dataDir</name>                

    <value>/home/carl/etc/zookeeper</value>      

  </property>

</configuration>

The last step is just to start the server
>bin/start-hbase.sh

Visit the web UI
http://ubuntu-master:60010/master-status

References:
https://hbase.apache.org/http://www.alidata.org/archives/1509

http://blog.csdn.net/heyutao007/article/details/6920882
http://blog.sina.com.cn/s/blog_5c5d5cdf0101dvgq.html      hadoop hbase zookeeper
http://www.cnblogs.com/ventlam/archive/2011/01/22/HBaseCluster.html

http://www.searchtb.com/2011/01/understanding-hbase.html
http://www.searchdatabase.com.cn/showcontent_31652.htm

hadoop
http://sillycat.iteye.com/blog/1556106
http://sillycat.iteye.com/blog/1556107

tips about hadoop
http://blog.chinaunix.net/uid-20682147-id-4229024.html
http://my.oschina.net/skyim/blog/228486
http://blog.huangchaosuper.cn/work/tech/2014/04/24/hadoop-install.html
http://blog.sina.com.cn/s/blog_45d2413b0102e2zx.html
http://www.it165.net/os/html/201405/8311.html

分享到:
评论
1 楼 sillycat 2015-03-25  
The port number is changed after hbase-1.0.0.
http://ubuntu-master:16030/master-status

相关推荐

    HBase Introduction

    ### HBase介绍与内部原理详解 #### 一、HBase简介 HBase是Apache Hadoop生态系统中的一个重要的组件,它提供了一个分布式、可扩展的列式存储系统。HBase主要面向那些需要处理海量数据的应用场景,并且对于这些数据...

    hbase1.x 跟2.x比较.docx

    HBase 1.x 和 2.x 版本比较 HBase 是一个高可靠、高性能、面向列、可伸缩的分布式数据存储系统。随着业务的增长,HBase 老集群(使用 HBase 1.0 版本)遇到了多个问题,包括节点基础环境不一致、服务器运行多年已过...

    HbaseTemplate 操作hbase

    1. **HbaseTemplate的初始化**:在使用HbaseTemplate之前,我们需要在Spring配置文件中配置HBase的相关连接信息,如Zookeeper地址、HBase表名等,并实例化HbaseTemplate。这通常通过@Autowired注解和@Configuration...

    hbase1.2.0and2.0.5.rar

    标题"Hbase1.2.0 and 2.0.5"指的是HBase的两个主要版本。HBase 1.2.0是HBase的一个稳定版本,发布于2016年,它带来了许多性能改进和新特性,例如更好的Region分裂策略、支持更高效的过滤器以及增强的监控和管理工具...

    Introduction of HBase

    **标题:“Introduction of HBase”** HBase,全称为Hadoop Distributed File System的表数据库,是一种基于Google Bigtable设计理念构建的开源、分布式、版本化的非关系型数据库(NoSQL)。它设计用于处理海量数据...

    CDH-Hbase的安装1

    1. **通过Cloudera Manager安装HBase**: - 登录到Cloudera Manager的主页面。 - 选择“添加服务”选项。 - 在可用服务列表中,找到并选中HBase,然后点击“继续”按钮。 - 部署HBase Master和Region服务器。...

    Difference between HBase and RDBMS

    1. 数据模型: - RDBMS采用表格形式存储数据,强调行与列的关系,支持复杂的查询语言(如SQL),适用于高度结构化的数据。 - HBase则是列族(Column Family)存储,数据以稀疏、多维的键值对形式存在,更适合非...

    Ted Yu:HBase and HOYA

    HBase是Apache软件基金会的一个开源项目,它是一个分布式的、可扩展的、非关系型数据库(NoSQL),主要用来处理大量稀疏数据的存储和检索。HBase通常部署在Hadoop集群上,使用Hadoop的HDFS作为其底层存储系统,并且...

    在windows上安装Hbase

    1. 新建hbase-site.xml文件,内容如下: ``` &lt;name&gt;hbase.rootdir &lt;value&gt;file:///DIRECTORY/hbase &lt;name&gt;hbase.master &lt;value&gt;localhost:60000 &lt;name&gt;hbase.zookeeper.quorum &lt;value&gt;127.0.0.1 ...

    HBase官方文档中文版-HBase手册中文版

    1. Region服务器:存储HBase表的分区,负责处理表的读写请求。 2. Master节点:管理Region服务器,处理表和Region的分配,监控服务器健康状态,进行Region分裂和合并操作。 3. ZooKeeper:协调HBase集群,提供服务...

    尚硅谷大数据技术之 Hbase1

    【尚硅谷大数据技术之 Hbase1】主要涵盖了HBase的基础概念、核心组件和架构,以及安装过程。HBase是一个基于列式存储的分布式数据库,它设计用于处理大规模数据,尤其适用于非结构化和半结构化的数据。以下是相关...

    Hadoop原理与技术Hbase的基本操作

    1:指令操作Hbase (1):start-all.sh,启动所有进程 (2):start-hbase.sh,启动hbase (3):hbase shell,进入hbase (4):list,显示所有表 (5):解决显示所有表遇到的错误(由于集群处于安全模式,该模式下只能进行...

    Hbase1.x可视化客户端工具

    在HBase 1.x版本中,为了便于用户更直观地管理和操作HBase表,开发了一系列的可视化客户端工具。这些工具不仅简化了命令行操作的复杂性,还通过图形化的界面使得数据查询和管理变得更加友好。 首先,我们要了解...

    thrift1 查询hbase

    在本案例中,"thrift1 查询hbase"是指使用Python通过Thrift1接口来与HBase进行交互,实现数据的查询操作。下面将详细讲解这个过程。 1. **Thrift接口**:Thrift提供了一种序列化和RPC(远程过程调用)机制,允许...

    Hbase权威指南(HBase: The Definitive Guide)

    - **表格、行、列与单元格**(Tables, Rows, Columns, and Cells):HBase的基本存储单位是表,表由多个行组成,每一行包含若干列,列又分为不同的列族,每一个列族下有具体的列名,数据则存储在单元格中。...

    pinpoint的hbase初始化脚本hbase-create.hbase

    搭建pinpoint需要的hbase初始化脚本hbase-create.hbase

    hbase1.0.3_part1

    在“hbase1.0.3_part1”这个压缩包中,我们可以预见到它包含了HBase 1.0.3版本的相关文件。这个版本发布于2015年,虽然现在已经有了更新的版本,但在当时,1.0.3是一个重要的里程碑,引入了许多优化和改进,以提高...

    Architecting HBase Applications

    HBase is a remarkable tool for indexing mass volumes of data, but getting started with this distributed database and its ecosystem can be daunting. With this hands-on guide, you’ll learn how to ...

    apache-kylin-3.0.2-bin-hbase1x.tar.gz

    这个“apache-kylin-3.0.2-bin-hbase1x.tar.gz”文件是Apache Kylin的3.0.2版本的二进制发行版,针对HBase 1.x版本进行了优化。下面我们将详细讨论Apache Kylin及其3.0.2版本的关键特性,以及与HBase的集成。 ...

Global site tag (gtag.js) - Google Analytics