HBase(1)Introduction and Installation
1. HBase Introduction
Hadoop Database ——> Hadoop HDFS
Hadoop Database ——>Hadoop MapReduce
Hadoop Database ——> Hadoop Zookeeper
Fundamentally Distributed — partitioning(sharding), replication
Column Oriented
Sequential Write in memory——flush to disk
Merged Read
Periodic Data Compation
Pig(Data Flow) Hive(SQL), Sqoop(RDBMS importing support)
HMaster Server: Region assignment Mgmt(Hadoop Master,NameNode,JobTracker)
HRegionServer #1:DateNode, TaskTracker
2. Install and Setup Hadoop
Install protoc
>wgethttps://protobuf.googlecode.com/files/protobuf-2.5.0.tar.gz
Unzip and cd to that directory
>./configure --prefix=/Users/carl/tool/protobuf-2.5.0
>make
>make install
>sudo ln -s /Users/carl/tool/protobuf-2.5.0 /opt/protobuf-2.5.0
>sudo ln -s /opt/protobuf-2.5.0 /opt/protobuf
Add this line to my environment
export PATH=/opt/protobuf/bin:$PATH
Check the Installation Environment
>protoc --version
libprotoc 2.5.0
Compile Hadoop
>svn checkouthttp://svn.apache.org/repos/asf/hadoop/common/tags/release-2.4.0hadoop-common-2.4.0
Read the document here for building
http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.4.0/BUILDING.txt
>cd hadoop-common-2.4.0/
>mvn clean install -DskipTests
>cd hadoop-mapreduce-project
>mvn clean install assembly:assembly -Pnative
Maybe my machine is too slow, so a lot of timeout Error on my machine. So I redo it like this
>mvn clean -DskipTests install assembly:assembly -Pnative
Need to get rid of the native
>mvn clean -DskipTests install assembly:assembly
Not working, read the document INSTALL
http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.4.0/hadoop-mapreduce-project/INSTALL
>cd ..
>mvn clean package -Pdist -Dtar -DskipTests
Lastest document from here
/Users/carl/data/installation/hadoop-2.4.0/share/doc/hadoop/hadoop-project-dist/hadoop-common/SingleCluster.html
Follow the BLOG and set hadoop on 4 machines
http://sillycat.iteye.com/blog/2084169
http://sillycat.iteye.com/blog/2090186
>sbin/start-dfs.sh
>sbin/start-yarn.sh
>sbin/mr-jobhistory-daemon.sh start historyserver
3. Setup Zookeeper
Follow the BLOG and set zookeeper on 3 machines
http://sillycat.iteye.com/blog/2015175
>zkServer.sh start conf/zoo-cluster.cfg
4. Try install HBase - Standalone HBase
Download this version since I am using hadoop 2.4.x
>wget http://mirrors.gigenet.com/apache/hbase/hbase-0.98.4/hbase-0.98.4-hadoop2-bin.tar.gz
Unzip the file and move it to the work directory.
>sudo ln -s /home/carl/tool/hbase-0.98.4 /opt/hbase-0.98.4
>sudo ln -s /opt/hbase-0.98.4 /opt/hbase
Check and modify the configuration file
>cat conf/hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>file:///opt/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/carl/etc/hbase</value>
</property>
</configuration>
Start the Service
>bin/start-hbase.sh
>jps
2036 NameNode 4084 Jps 3340 HMaster 2403 ResourceManager 2263 SecondaryNameNode 2686 JobHistoryServer
Enter the Client Shell
>bin/hbase shell
Create the table
>create 'test', 'cf'
Check the info on that table
>list 'test'
Inserting data
>put 'test', 'row1', 'cf:a', 'value1'
>put 'test', 'row2', 'cf:a', 'value2'
>put 'test', 'row3', 'cf:a', 'value3'
row1 should be the row key, column will be cf:a, value should be value1.
Get all the data
>scan 'test'
ROW COLUMN+CELL row1 column=cf:a, timestamp=1407169545627, value=value1 row2 column=cf:a, timestamp=1407169557668, value=value2 row3 column=cf:a, timestamp=1407169563458, value=value3 3 row(s) in 0.0630 seconds
Get a single row
>get 'test', 'row1'
COLUMN CELL cf:a timestamp=1407169545627, value=value1
Some other command
>disable ‘test’
>enable ‘test’
>drop ‘test'
5. Pseudo-Distributed Local Install
Change the configuration as follow
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://ubuntu-master:9000/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/carl/etc/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.master.wait.on.regionservers.mintostart</name>
<value>1</value>
</property>
</configuration>
List the HDFS directory
>hadoop fs -ls /
Found 4 items drwxr-xr-x - carl supergroup 0 2014-07-09 13:22 /data drwxr-xr-x - carl supergroup 0 2014-08-04 11:47 /hbase drwxr-xr-x - carl supergroup 0 2014-07-10 13:09 /output drwxrwx--- - carl supergroup 0 2014-08-04 11:21 /tmp
>hadoop fs -ls /hbase
Found 6 items drwxr-xr-x - carl supergroup 0 2014-08-04 11:48 /hbase/.tmp drwxr-xr-x - carl supergroup 0 2014-08-04 11:47 /hbase/WALs drwxr-xr-x - carl supergroup 0 2014-08-04 11:48 /hbase/data -rw-r--r-- 3 carl supergroup 42 2014-08-04 11:47 /hbase/hbase.id -rw-r--r-- 3 carl supergroup 7 2014-08-04 11:47 /hbase/hbase.version drwxr-xr-x - carl supergroup 0 2014-08-04 11:47 /hbase/oldWALs
Start HMaster Backup Servers
The default port number for HMaster is 16010, 16020, 16030
>bin/local-master-backup.sh start 2 3 5
That will start 3 HMaster backup Server on 16012/16022/16032,16013/16023/16033, 16015/16025/16035
Find the process id in /tmp/hbase-USERS-x-master.pid to stop the server
For example
>cat /tmp/hbase-carl-2-master.pid
6442
>cat /tmp/hbase-carl-5-master.pid |xargs kill -9
Start and stop Additional RegionServers
The default port is 16020,16030. But the base additional ports are 16200, 16300.
>bin/local-regionservers.sh start 2 3 5
>bin/local-regionservers.sh stop 5
6. Fully Distributed
I have the 4 machines, I will list them as follow:
ubuntu-master hmaster
ubuntu-client1 hmaster-backup
ubuntu-client2 regionserver
ubuntu-client3 regionserver
Set up the Configuration
>cat conf/regionservers
ubuntu-client2 ubuntu-client3
>cat conf/backup-masters
ubuntu-client1
Since I already have the ZK running, so
>vi conf/hbase-env.sh
export HBASE_MANAGES_ZK=false
The main configuration file
>cat conf/hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://ubuntu-master:9000/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/carl/etc/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.master.wait.on.regionservers.mintostart</name>
<value>1</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>ubuntu-client1,ubuntu-client2,ubuntu-client3</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/carl/etc/zookeeper</value>
</property>
</configuration>
The last step is just to start the server
>bin/start-hbase.sh
Visit the web UI
http://ubuntu-master:60010/master-status
References:
https://hbase.apache.org/http://www.alidata.org/archives/1509
http://blog.csdn.net/heyutao007/article/details/6920882
http://blog.sina.com.cn/s/blog_5c5d5cdf0101dvgq.html hadoop hbase zookeeper
http://www.cnblogs.com/ventlam/archive/2011/01/22/HBaseCluster.html
http://www.searchtb.com/2011/01/understanding-hbase.html
http://www.searchdatabase.com.cn/showcontent_31652.htm
hadoop
http://sillycat.iteye.com/blog/1556106
http://sillycat.iteye.com/blog/1556107
tips about hadoop
http://blog.chinaunix.net/uid-20682147-id-4229024.html
http://my.oschina.net/skyim/blog/228486
http://blog.huangchaosuper.cn/work/tech/2014/04/24/hadoop-install.html
http://blog.sina.com.cn/s/blog_45d2413b0102e2zx.html
http://www.it165.net/os/html/201405/8311.html
相关推荐
8. Running and Confirming Your Installation(运行和确认安装) - 完成安装后,用户需要运行HBase并确认安装成功。文档可能包含检查安装是否成功的步骤和方法。 以上知识点基本覆盖了Apache HBase的入门知识和...
Fully revised for HBase 1.0, this second edition brings you up to speed on the new HBase client API, as well as security features and new case studies that demonstrate HBase use in the real world. ...
- **Managing Impala**: Post-installation configuration, including setting up ODBC and JDBC connections, is crucial for optimal performance and security. #### Configuring Impala to Work with ODBC and ...
Cluster Setup and Installation 299 Installing Java 300 Creating a Hadoop User 300 Installing Hadoop 300 Testing the Installation 301 SSH Configuration 301 Hadoop Configuration 302 Configuration ...