HBase(1)Introduction and Installation
1. HBase Introduction
Hadoop Database ——> Hadoop HDFS
Hadoop Database ——>Hadoop MapReduce
Hadoop Database ——> Hadoop Zookeeper
Fundamentally Distributed — partitioning(sharding), replication
Column Oriented
Sequential Write in memory——flush to disk
Merged Read
Periodic Data Compation
Pig(Data Flow) Hive(SQL), Sqoop(RDBMS importing support)
HMaster Server: Region assignment Mgmt(Hadoop Master,NameNode,JobTracker)
HRegionServer #1:DateNode, TaskTracker
2. Install and Setup Hadoop
Install protoc
Unzip and cd to that directory
>./configure --prefix=/Users/carl/tool/protobuf-2.5.0
>make install
>sudo ln -s /Users/carl/tool/protobuf-2.5.0 /opt/protobuf-2.5.0
>sudo ln -s /opt/protobuf-2.5.0 /opt/protobuf
Add this line to my environment
export PATH=/opt/protobuf/bin:$PATH
Check the Installation Environment
>protoc --version
libprotoc 2.5.0
Compile Hadoop
>svn checkouthttp://svn.apache.org/repos/asf/hadoop/common/tags/release-2.4.0hadoop-common-2.4.0
Read the document here for building
>cd hadoop-common-2.4.0/
>mvn clean install -DskipTests
>cd hadoop-mapreduce-project
>mvn clean install assembly:assembly -Pnative
Maybe my machine is too slow, so a lot of timeout Error on my machine. So I redo it like this
>mvn clean -DskipTests install assembly:assembly -Pnative
Need to get rid of the native
>mvn clean -DskipTests install assembly:assembly
Not working, read the document INSTALL
>cd ..
>mvn clean package -Pdist -Dtar -DskipTests
Lastest document from here
Follow the BLOG and set hadoop on 4 machines
>sbin/mr-jobhistory-daemon.sh start historyserver
3. Setup Zookeeper
Follow the BLOG and set zookeeper on 3 machines
>zkServer.sh start conf/zoo-cluster.cfg
4. Try install HBase - Standalone HBase
Download this version since I am using hadoop 2.4.x
>wget http://mirrors.gigenet.com/apache/hbase/hbase-0.98.4/hbase-0.98.4-hadoop2-bin.tar.gz
Unzip the file and move it to the work directory.
>sudo ln -s /home/carl/tool/hbase-0.98.4 /opt/hbase-0.98.4
>sudo ln -s /opt/hbase-0.98.4 /opt/hbase
Check and modify the configuration file
>cat conf/hbase-site.xml
Start the Service
2036 NameNode 4084 Jps 3340 HMaster 2403 ResourceManager 2263 SecondaryNameNode 2686 JobHistoryServer
Enter the Client Shell
>bin/hbase shell
Create the table
>create 'test', 'cf'
Check the info on that table
>list 'test'
Inserting data
>put 'test', 'row1', 'cf:a', 'value1'
>put 'test', 'row2', 'cf:a', 'value2'
>put 'test', 'row3', 'cf:a', 'value3'
row1 should be the row key, column will be cf:a, value should be value1.
Get all the data
>scan 'test'
ROW COLUMN+CELL row1 column=cf:a, timestamp=1407169545627, value=value1 row2 column=cf:a, timestamp=1407169557668, value=value2 row3 column=cf:a, timestamp=1407169563458, value=value3 3 row(s) in 0.0630 seconds
Get a single row
>get 'test', 'row1'
COLUMN CELL cf:a timestamp=1407169545627, value=value1
Some other command
>disable ‘test’
>enable ‘test’
>drop ‘test'
5. Pseudo-Distributed Local Install
Change the configuration as follow
List the HDFS directory
>hadoop fs -ls /
Found 4 items drwxr-xr-x - carl supergroup 0 2014-07-09 13:22 /data drwxr-xr-x - carl supergroup 0 2014-08-04 11:47 /hbase drwxr-xr-x - carl supergroup 0 2014-07-10 13:09 /output drwxrwx--- - carl supergroup 0 2014-08-04 11:21 /tmp
>hadoop fs -ls /hbase
Found 6 items drwxr-xr-x - carl supergroup 0 2014-08-04 11:48 /hbase/.tmp drwxr-xr-x - carl supergroup 0 2014-08-04 11:47 /hbase/WALs drwxr-xr-x - carl supergroup 0 2014-08-04 11:48 /hbase/data -rw-r--r-- 3 carl supergroup 42 2014-08-04 11:47 /hbase/hbase.id -rw-r--r-- 3 carl supergroup 7 2014-08-04 11:47 /hbase/hbase.version drwxr-xr-x - carl supergroup 0 2014-08-04 11:47 /hbase/oldWALs
Start HMaster Backup Servers
The default port number for HMaster is 16010, 16020, 16030
>bin/local-master-backup.sh start 2 3 5
That will start 3 HMaster backup Server on 16012/16022/16032,16013/16023/16033, 16015/16025/16035
Find the process id in /tmp/hbase-USERS-x-master.pid to stop the server
For example
>cat /tmp/hbase-carl-2-master.pid
>cat /tmp/hbase-carl-5-master.pid |xargs kill -9
Start and stop Additional RegionServers
The default port is 16020,16030. But the base additional ports are 16200, 16300.
>bin/local-regionservers.sh start 2 3 5
>bin/local-regionservers.sh stop 5
6. Fully Distributed
I have the 4 machines, I will list them as follow:
ubuntu-master hmaster
ubuntu-client1 hmaster-backup
ubuntu-client2 regionserver
ubuntu-client3 regionserver
Set up the Configuration
>cat conf/regionservers
ubuntu-client2 ubuntu-client3
>cat conf/backup-masters
Since I already have the ZK running, so
>vi conf/hbase-env.sh
export HBASE_MANAGES_ZK=false
The main configuration file
>cat conf/hbase-site.xml
The last step is just to start the server
Visit the web UI
