`
sillycat
  • 浏览: 2566544 次
  • 性别: Icon_minigender_1
  • 来自: 成都
社区版块
存档分类
最新评论

HBase(1)Introduction and Installation

 
阅读更多

HBase(1)Introduction and Installation

1. HBase Introduction
Hadoop Database  ——> Hadoop HDFS
Hadoop Database ——>Hadoop MapReduce
Hadoop Database ——> Hadoop Zookeeper

Fundamentally Distributed — partitioning(sharding), replication
Column Oriented
Sequential Write     in memory——flush to disk
Merged Read
Periodic Data Compation

Pig(Data Flow) Hive(SQL), Sqoop(RDBMS importing support)

HMaster Server: Region assignment Mgmt(Hadoop Master,NameNode,JobTracker)

HRegionServer #1:DateNode, TaskTracker


2. Install and Setup Hadoop
Install protoc
>wgethttps://protobuf.googlecode.com/files/protobuf-2.5.0.tar.gz
Unzip and cd to that directory
>./configure --prefix=/Users/carl/tool/protobuf-2.5.0
>make
>make install
>sudo ln -s /Users/carl/tool/protobuf-2.5.0 /opt/protobuf-2.5.0
>sudo ln -s /opt/protobuf-2.5.0 /opt/protobuf

Add this line to my environment
export PATH=/opt/protobuf/bin:$PATH

Check the Installation Environment
>protoc --version
libprotoc 2.5.0

Compile Hadoop
>svn checkouthttp://svn.apache.org/repos/asf/hadoop/common/tags/release-2.4.0hadoop-common-2.4.0

Read the document here for building
http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.4.0/BUILDING.txt
>cd hadoop-common-2.4.0/
>mvn clean install -DskipTests
>cd hadoop-mapreduce-project
>mvn clean install assembly:assembly -Pnative

Maybe my machine is too slow, so a lot of timeout Error on my machine. So I redo it like this
>mvn clean -DskipTests install assembly:assembly -Pnative

Need to get rid of the native
>mvn clean -DskipTests install assembly:assembly

Not working, read the document INSTALL
http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.4.0/hadoop-mapreduce-project/INSTALL
>cd ..
>mvn clean package -Pdist -Dtar -DskipTests

Lastest document from here
/Users/carl/data/installation/hadoop-2.4.0/share/doc/hadoop/hadoop-project-dist/hadoop-common/SingleCluster.html 

Follow the BLOG and set hadoop on 4 machines
http://sillycat.iteye.com/blog/2084169
http://sillycat.iteye.com/blog/2090186

>sbin/start-dfs.sh
>sbin/start-yarn.sh
>sbin/mr-jobhistory-daemon.sh start historyserver

3. Setup Zookeeper
Follow the BLOG and set zookeeper on 3 machines
http://sillycat.iteye.com/blog/2015175

>zkServer.sh start conf/zoo-cluster.cfg

4. Try install HBase - Standalone HBase

Download this version since I am using hadoop 2.4.x
>wget http://mirrors.gigenet.com/apache/hbase/hbase-0.98.4/hbase-0.98.4-hadoop2-bin.tar.gz
Unzip the file and move it to the work directory.

>sudo ln -s /home/carl/tool/hbase-0.98.4 /opt/hbase-0.98.4
>sudo ln -s /opt/hbase-0.98.4 /opt/hbase

Check and modify the configuration file
>cat conf/hbase-site.xml 
<configuration>      

  <property>          

   <name>hbase.rootdir</name>          

   <value>file:///opt/hbase</value>      

  </property>      

  <property>          

    <name>hbase.zookeeper.property.dataDir</name>          

    <value>/home/carl/etc/hbase</value>      

  </property>

</configuration>

Start the Service
>bin/start-hbase.sh 

>jps
2036 NameNode 4084 Jps 3340 HMaster 2403 ResourceManager 2263 SecondaryNameNode 2686 JobHistoryServer

Enter the Client Shell
>bin/hbase shell

Create the table
>create 'test', 'cf'

Check the info on that table
>list 'test'

Inserting data
>put 'test', 'row1', 'cf:a', 'value1'
>put 'test', 'row2', 'cf:a', 'value2'
>put 'test', 'row3', 'cf:a', 'value3'
row1 should be the row key, column will be cf:a, value should be value1.

Get all the data
>scan 'test'
ROW                                         COLUMN+CELL                                                                        row1                                       column=cf:a, timestamp=1407169545627, value=value1                                                                           row2                                       column=cf:a, timestamp=1407169557668, value=value2                                                                           row3                                       column=cf:a, timestamp=1407169563458, value=value3                                                                          3 row(s) in 0.0630 seconds

Get a single row
>get 'test', 'row1'
COLUMN                                      CELL                                                                                                                         cf:a                                       timestamp=1407169545627, value=value1   

Some other command
>disable ‘test’
>enable ‘test’
>drop ‘test'

5. Pseudo-Distributed Local Install
Change the configuration as follow
<configuration>      

  <property>          

    <name>hbase.rootdir</name>          

    <value>hdfs://ubuntu-master:9000/hbase</value>      

  </property>      

  <property>          

    <name>hbase.zookeeper.property.dataDir</name>          

    <value>/home/carl/etc/hbase</value>      

  </property>      

  <property>          

    <name>hbase.cluster.distributed</name>          

    <value>true</value>      

  </property>      

  <property>          

    <name>hbase.master.wait.on.regionservers.mintostart</name>          

    <value>1</value>      

  </property>

</configuration>

List the HDFS directory
>hadoop fs -ls /
Found 4 items drwxr-xr-x   - carl supergroup          0 2014-07-09 13:22 /data drwxr-xr-x   - carl supergroup          0 2014-08-04 11:47 /hbase drwxr-xr-x   - carl supergroup          0 2014-07-10 13:09 /output drwxrwx---   - carl supergroup          0 2014-08-04 11:21 /tmp

>hadoop fs -ls /hbase
Found 6 items drwxr-xr-x   - carl supergroup          0 2014-08-04 11:48 /hbase/.tmp drwxr-xr-x   - carl supergroup          0 2014-08-04 11:47 /hbase/WALs drwxr-xr-x   - carl supergroup          0 2014-08-04 11:48 /hbase/data -rw-r--r--   3 carl supergroup         42 2014-08-04 11:47 /hbase/hbase.id -rw-r--r--   3 carl supergroup          7 2014-08-04 11:47 /hbase/hbase.version drwxr-xr-x   - carl supergroup          0 2014-08-04 11:47 /hbase/oldWALs

Start HMaster Backup Servers
The default port number for HMaster is 16010, 16020, 16030
>bin/local-master-backup.sh start 2 3 5 

That will start 3 HMaster backup Server on 16012/16022/16032,16013/16023/16033, 16015/16025/16035

Find the process id in /tmp/hbase-USERS-x-master.pid to stop the server

For example
>cat /tmp/hbase-carl-2-master.pid 
6442
>cat /tmp/hbase-carl-5-master.pid |xargs kill -9

Start and stop Additional RegionServers
The default port is 16020,16030. But the base additional ports are 16200, 16300.
>bin/local-regionservers.sh start 2 3 5

>bin/local-regionservers.sh stop 5

6. Fully Distributed
I have the 4 machines, I will list them as follow:
ubuntu-master     hmaster
ubuntu-client1      hmaster-backup
ubuntu-client2      regionserver
ubuntu-client3      regionserver

Set up the Configuration
>cat conf/regionservers 
ubuntu-client2 ubuntu-client3

>cat conf/backup-masters 
ubuntu-client1

Since I already have the ZK running, so 
>vi conf/hbase-env.sh
export HBASE_MANAGES_ZK=false

The main configuration file
>cat conf/hbase-site.xml 
<configuration>      

  <property>          

    <name>hbase.rootdir</name>          

    <value>hdfs://ubuntu-master:9000/hbase</value>      

  </property>      

  <property>          

    <name>hbase.zookeeper.property.dataDir</name>          

    <value>/home/carl/etc/hbase</value>      

  </property>      

  <property>          

    <name>hbase.cluster.distributed</name>          

    <value>true</value>      

  </property>      

  <property>          

    <name>hbase.master.wait.on.regionservers.mintostart</name>          

    <value>1</value>      

  </property>      

  <property>          

    <name>hbase.zookeeper.quorum</name>          

    <value>ubuntu-client1,ubuntu-client2,ubuntu-client3</value>      

  </property>      

  <property>          

    <name>hbase.zookeeper.property.dataDir</name>                

    <value>/home/carl/etc/zookeeper</value>      

  </property>

</configuration>

The last step is just to start the server
>bin/start-hbase.sh

Visit the web UI
http://ubuntu-master:60010/master-status

References:
https://hbase.apache.org/http://www.alidata.org/archives/1509

http://blog.csdn.net/heyutao007/article/details/6920882
http://blog.sina.com.cn/s/blog_5c5d5cdf0101dvgq.html      hadoop hbase zookeeper
http://www.cnblogs.com/ventlam/archive/2011/01/22/HBaseCluster.html

http://www.searchtb.com/2011/01/understanding-hbase.html
http://www.searchdatabase.com.cn/showcontent_31652.htm

hadoop
http://sillycat.iteye.com/blog/1556106
http://sillycat.iteye.com/blog/1556107

tips about hadoop
http://blog.chinaunix.net/uid-20682147-id-4229024.html
http://my.oschina.net/skyim/blog/228486
http://blog.huangchaosuper.cn/work/tech/2014/04/24/hadoop-install.html
http://blog.sina.com.cn/s/blog_45d2413b0102e2zx.html
http://www.it165.net/os/html/201405/8311.html

分享到:
评论
1 楼 sillycat 2015-03-25  
The port number is changed after hbase-1.0.0.
http://ubuntu-master:16030/master-status

相关推荐

    hbase-apache离线官方文档

    8. Running and Confirming Your Installation(运行和确认安装) - 完成安装后,用户需要运行HBase并确认安装成功。文档可能包含检查安装是否成功的步骤和方法。 以上知识点基本覆盖了Apache HBase的入门知识和...

    HBase.The.Definitive.Guide.2nd.Edition

    Fully revised for HBase 1.0, this second edition brings you up to speed on the new HBase client API, as well as security features and new case studies that demonstrate HBase use in the real world. ...

    impala-2.8

    - **Managing Impala**: Post-installation configuration, including setting up ODBC and JDBC connections, is crucial for optimal performance and security. #### Configuring Impala to Work with ODBC and ...

    hadoop_the_definitive_guide_3nd_edition

    Cluster Setup and Installation 299 Installing Java 300 Creating a Hadoop User 300 Installing Hadoop 300 Testing the Installation 301 SSH Configuration 301 Hadoop Configuration 302 Configuration ...

Global site tag (gtag.js) - Google Analytics