来源:https://ccp.cloudera.com/display/CDHDOC/HBase+Installation
Contents
Apache HBase provides large-scale tabular storage for Hadoop using the Hadoop Distributed File System (HDFS). Cloudera recommends installing HBase in a standalone mode before you try to run it on a whole cluster.
Upgrading HBase to the Latest CDH3 Release
The instructions that follow assume that you are upgrading HBase as part of an upgrade to the latest CDH3 release, and have already performed the steps underUpgrading CDH3.
To upgrade HBase to the latest CDH3 release, proceed as follows.
Step 1: Perform a Graceful Cluster Shutdown
To shut HBase down gracefully, stop the Thrift server and clients, then stop the cluster.
- Stop the Thrift server and clients
sudo service hadoop-hbase-thrift stop
- Stop the cluster.
- Use the following command on the master node:
sudo service hadoop-hbase-master stop
- Use the following command on each node hosting a region server:
sudo service hadoop-hbase-regionserver stop
- Use the following command on the master node:
This shuts down the master and the region servers gracefully.
Step 2. Stop the ZooKeeper Server
$ sudo service hadoop-zookeeper-server stop |
Step 3: Install the new version of HBase
Follow directions in the next section,Installing HBase.
Installing HBase
To install HBase on Ubuntu and other Debian systems:
$ sudo apt-get install hadoop-hbase |
To install HBase On Red Hat-compatible systems:
$ sudo yum install hadoop-hbase |
To install HBase on SUSE systems:
$ sudo zypper install hadoop-hbase |
To list the installed files on Ubuntu and other Debian systems:
$ dpkg -L hadoop-hbase |
To list the installed files on Red Hat and SUSE systems:
$ rpm -ql hadoop-hbase |
You can see that the HBase package has been configured to conform to the Linux Filesystem Hierarchy Standard. (To learn more, runman hier).
HBase wrapper script /usr/bin/hbase HBase Configuration Files /etc/hbase/conf HBase Jar and Library Files /usr/lib/hbase HBase Log Files /var/log/hbase HBase service scripts /etc/init.d/hadoop-hbase-* |
You are now ready to enable the server daemons you want to use with Hadoop. Java-based client access is also available by adding the jars in/usr/lib/hbase/and/usr/lib/hbase/lib/to your Java class path.
Host Configuration Settings for HBase
Configuring the REST Port
You can use aninit.dscript,/etc/init.d/hadoop-hbase-rest, to start the REST server; for example:
/etc/init.d/hadoop-hbase-rest start |
The script starts the server by default on port 8080. This is a commonly used port and so may conflict with other applications running on the same host.
If you need change the port for the REST server, configure it inhbase-site.xml, for example:
<property>
<name>hbase.rest.port</name>
<value>60050</value>
</property> |
Using DNS with HBase
HBase uses the local hostname to report its IP address. Both forward and reverse DNS resolving should work. If your machine has multiple interfaces, HBase uses the interface that the primary hostname resolves to. If this is insufficient, you can sethbase.regionserver.dns.interfacein thehbase-site.xmlfile to indicate the primary interface. To work properly, this setting requires that your cluster configuration is consistent and every host has the same network interface configuration. As an alternative, you can sethbase.regionserver.dns.nameserverin thehbase-site.xmlfile to choose a different name server than the system-wide default.
Using the Network Time Protocol (NTP) with HBase
The clocks on cluster members should be in basic alignments. Some skew is tolerable, but excessive skew could generate odd behaviors. RunNTPon your cluster, or an equivalent. If you are having problems querying data or unusual cluster operations, verify the system time.
Setting User Limits for HBase
Because HBase is a database, it uses a lot of files at the same time. The default ulimit setting of 1024 for the maximum number of open files on Unix systems is insufficient. Any significant amount of loading will result in failures in strange ways and cause the error messagejava.io.IOException...(Too many open files)to be logged in the HBase or HDFS log files. For more information about this issue, see theApache HBase Book. You may also notice errors such as:
2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Exception increateBlockOutputStream java.io.EOFException 2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6935524980745310745_1391901 |
Configuring ulimit for HBase
Cloudera recommends increasing the maximum number of file handles to more than 10,000. Note that increasing the file handles for the user who is running the HBase process is an operating system configuration, not an HBase configuration. Also, a common mistake is to increase the number of file handles for a particular user but, for whatever reason, HBase will be running as a different user. HBase prints the ulimit it is using on the first line in the logs. Make sure that it is correct.
If you are using ulimit, you must make the following configuration changes:
- In the/etc/security/limits.conffile, add the following lines:
hdfs - nofile 32768
hbase - nofile 32768
- To apply the changes in/etc/security/limits.confon Ubuntu and other Debian systems, add the following line in the/etc/pam.d/common-sessionfile:
session required pam_limits.so
Using dfs.datanode.max.xcievers with HBase
A Hadoop HDFS DataNode has an upper bound on the number of files that it can serve at any one time. The upper bound property is calleddfs.datanode.max.xcievers(the property is spelled in the code exactly as shown here). Before loading, make sure you have configured the value fordfs.datanode.max.xcieversin theconf/hdfs-site.xmlfile to at least 4096 as shown below:
<property>
<name>dfs.datanode.max.xcievers</name>
<value>4096</value>
</property> |
Be sure to restart HDFS after changing the value fordfs.datanode.max.xcievers. If you don't change that value as described, strange failures can occur and an error message about exceeding the number ofxcieverswill be added to the DataNode logs. Other error messages about missing blocks are also logged, such as:
10/12/08 20:10:31 INFO hdfs.DFSClient: Could not obtain block blk_XXXXXXXXXXXXXXXXXXXXXX_YYYYYYYY from any node: java.io.IOException: No live nodes contain current block. Will get new block locations from namenode and retry... |
Starting HBase in Standalone Mode
By default, HBase ships configured forstandalone mode. In this mode of operation, a single JVM hosts the HBase Master, an HBase Region Server, and a ZooKeeper quorum peer. In order to run HBase in standalone mode, you must install the HBase Master package:
Installing the HBase Master for Standalone Operation
To install the HBase Master on Ubuntu and other Debian systems:
$ sudo apt-get install hadoop-hbase-master |
To install the HBase Master On Red Hat-compatible systems:
$ sudo yum install hadoop-hbase-master |
To install the HBase Master on SUSE systems:
$ sudo zypper install hadoop-hbase-master |
Starting the HBase Master
On Red Hat and SUSE systems (using.rpmpackages) you can start now start the HBase Master by using the included service script:
$ sudo /etc/init.d/hadoop-hbase-master start |
On Ubuntu systems (using Debian packages) the HBase Master starts when the HBase package is installed.
To verify that the standalone installation is operational, visithttp://localhost:60010. The list of Region Servers at the bottom of the page should include one entry for your local machine.
If you see this message when you start the HBase standalone master:
Starting Hadoop HBase master daemon: starting master, logging to /usr/lib/hbase/logs/hbase-hbase-master/cloudera-vm.out Couldnt start ZK at requested address of 2181, instead got: 2182. Aborting. Why? Because clients (eg shell) wont be able to find this ZK quorum hbase-master. |
you will need to stop the hadoop-zookeeper-server or uninstall the hadoop-zookeeper-server package.
Accessing HBase by using the HBase Shell
After you have started the standalone installation, you can access the database by using the HBase Shell:
$ hbase shell HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version: 0.89.20100621+17, r, Mon Jun 28 10:13:32 PDT 2010 hbase(main):001:0> status 'detailed' version 0.89.20100621+17 0 regionsInTransition 1 live servers
my-machine:59719 1277750189913
requests=0, regions=2, usedHeap=24, maxHeap=995
.META.,,1
stores=2, storefiles=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0
-ROOT-,,0
stores=1, storefiles=1, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0
0 dead servers |
Using MapReduce with HBase
To run MapReduce jobs that use HBase, you need to add the HBase and Zookeeper JAR files to the Hadoop Java classpath. You can do this by adding the following statement to each job:
TableMapReduceUtil.addDependencyJars(job); |
This distributes the JAR files to the cluster along with your job and adds them to the job's classpath, so that you do not need to edit the MapReduce configuration.
You can find more information aboutaddDependencyJarshere.
When getting anConfigurationobject for a HBase MapReduce job, instantiate it using theHBaseConfiguration.create()method.
Configuring HBase in Pseudo-distributed Mode
Pseudo-distributedmode differs fromstandalonemode in that each of the component processes run in a separate JVM.
Modifying the HBase Configuration
To enable pseudo-distributed mode, you must first make some configuration changes. Open/etc/hbase/conf/hbase-site.xmlin your editor of choice, and insert the following XML properties between the<configuration>and</configuration>tags. Be sure to replacelocalhostwith the host name of your HDFS Name Node if it is not running locally.
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property> <property>
<name>hbase.rootdir</name>
<value>hdfs://localhost/hbase</value>
</property> |
Creating the/hbaseDirectory inHDFS
Before starting the HBase Master, you need to create the/hbasedirectory inHDFS. The HBase master runs ashbase:hbaseso it does not have the required permissions to create a top level directory.
To create the/hbasedirectory inHDFS:
$ sudo -u hdfs hadoop fs -mkdir /hbase $ sudo -u hdfs hadoop fs -chown hbase /hbase |
Enabling Servers for Pseudo-distributed Operation
After you have configured HBase, you must enable the various servers that make up a distributed HBase cluster. HBase uses three required types of servers:
Installing and Starting ZooKeeper Server
HBase uses ZooKeeper Server as a highly available, central location for cluster management. For example, it allows clients to locate the servers, and ensures that only one master is active at a time. For a small cluster, running a ZooKeeper node colocated with the NameNode is recommended. For larger clusters, contact Cloudera Support for configuration help.
Install and start the ZooKeeper Server in standalone mode by running the commands shown in the"Installing the ZooKeeper Server Package on a Single Server" section ofZooKeeper Installation.
Starting the HBase Master
After ZooKeeper is running, you can start the HBase master in standalone mode.
$ sudo /etc/init.d/hadoop-hbase-master start |
Starting an HBase Region Server
The Region Server is the part of HBase that actually hosts data and processes requests. The region server typically runs on all of the slave nodes in a cluster, but not the master node.
To enable the HBase Region Server on Ubuntu and other Debian systems:
$ sudo apt-get install hadoop-hbase-regionserver |
To enable the HBase Region Server On Red Hat-compatible systems:
$ sudo yum install hadoop-hbase-regionserver |
To enable the HBase Region Server on SUSE systems:
$ sudo zypper install hadoop-hbase-regionserver |
To start the Region Server:
$ sudo /etc/init.d/hadoop-hbase-regionserver start |
Verifying the Pseudo-Distributed Operation
After you have started ZooKeeper, the Master, and a Region Server, the pseudo-distributed cluster should be up and running. You can verify that each of the daemons is running using thejpstool from the Oracle JDK, which you can obtain fromhere. If you are running a pseudo-distributed HDFS installation and a pseudo-distributed HBase installation on one machine,jpswill show the following output:
$ sudo jps 32694 Jps 30674 HRegionServer 29496 HMaster 28781 DataNode 28422 NameNode 30348 QuorumPeerMain |
You should also be able to navigate tohttp://localhost:60010and verify that the local region server has registered with the master.
Installing the HBase Thrift Server
The HBase Thrift Server is an alternative gateway for accessing the HBase server. Thrift mirrors most of the HBase client APIs while enabling popular programming languages to interact with HBase. The Thrift Server is multi-platform and performs better than REST in many situations. Thrift can be run collocated along with the region servers, but should not be collocated with the NameNode or the JobTracker. For more information about Thrift, visithttp://incubator.apache.org/thrift/.
To enable the HBase Thrift Server on Ubuntu and other Debian systems:
$ sudo apt-get install hadoop-hbase-thrift |
To enable the HBase Thrift Server On Red Hat-compatible systems:
$ sudo yum install hadoop-hbase-thrift |
To enable the HBase Thrift Server on SUSE systems:
$ sudo zypper install hadoop-hbase-thrift |
Deploying HBase in a Distributed Cluster
After you have HBase running in pseudo-distributed mode, the same configuration can be extended to running on a distributed cluster.
Choosing where to Deploy the Processes
For small clusters, Cloudera recommends designating one node in your cluster as the master node. On this node, you will typically run the HBase Master and a ZooKeeper quorum peer. These master processes may be collocated with the Hadoop NameNode and JobTracker for small clusters.
Designate the remaining nodes as slave nodes. On each node, Cloudera recommends running a Region Server, which may be collocated with a Hadoop TaskTracker and a DataNode. When collocating with TaskTrackers, be sure that the resources of the machine are not oversubscribed – it's safest to start with a small number of MapReduce slots and work up slowly.
Configuring for Distributed Operation
After you have decided which machines will run each process, you can edit the configuration so that the nodes may locate each other. In order to do so, you should make sure that the configuration files are synchronized across the cluster. Cloudera strongly recommends the use of a configuration management system to synchronize the configuration files, though you can use a simpler solution such asrsyncto get started quickly.
The only configuration change necessary to move from pseudo-distributed operation to fully-distributed operation is the addition of the ZooKeeper Quorum address inhbase-site.xml. Insert the following XML property to configure the nodes with the address of the node where the ZooKeeper quorum peer is running:
<property>
<name>hbase.zookeeper.quorum</name>
<value>mymasternode</value>
</property> |
To start the cluster, start the services in the following order:
- The ZooKeeper Quorum Peer
- The HBase Master
- Each of the HBase Region Servers
After the cluster is fully started, you can view the HBase Master web interface on port 60010 and verify that each of the slave nodes has registered properly with the master.
Troubleshooting
The Cloudera packages of HBase have been configured to place logs in/var/log/hbase. While getting started, Cloudera recommends tailing these logs to note any error messages or failures.
Viewing the HBase Documentation
For additional HBase documentation, seehttp://archive.cloudera.com/cdh/3/hbase/.
相关推荐
8. Running and Confirming Your Installation(运行和确认安装) - 完成安装后,用户需要运行HBase并确认安装成功。文档可能包含检查安装是否成功的步骤和方法。 以上知识点基本覆盖了Apache HBase的入门知识和...
- **hive_installation and load data.doc**:这份文档可能介绍了如何安装Hive以及如何加载数据到Hive仓库的步骤。 - **Hive理论_Hive-基于MapReduce框架的数据仓库解决方案_ZN.doc**:此文档可能深入讨论了Hive的...
Installation Chapter 3. Client API: The Basics Chapter 4. Client API: Advanced Features Chapter 5. Client API: Administrative Features Chapter 6. Available Clients Chapter 7. Hadoop Integration ...
, HCatalog, Pig, Hive, HBase, ZooKeeper and Ambari. Hortonworks is the major contributor of code and patches to many of these projects. These projects have been integrated and tested as part of the ...
Hortonworks数据平台包含了一系列Apache Hadoop项目的核心集合,包括MapReduce、Hadoop分布式文件系统(HDFS)、HCatalog、Pig、Hive、HBase、ZooKeeper和Ambari。Hortonworks是这些项目代码和补丁的主要贡献者。...
Unlike other books about R, written from the perspective of statistics, R for Programmers: Mastering the Tools is written from the perspective of programmers, ...Appendix H: Installation of HBase
基于马尔可夫链的信用卡欺诈检测的示例实现 ... 为了运行我们的示例,我们需要安装以下内容: ... $ [hbase_installation_dir]/bin/start-hbase.sh 这也将启动附加的 zookeeper 实例。 启动卡夫卡。 这将
CDH4(Cloudera's Distribution Including Apache Hadoop)是Cloudera公司提供的一款基于Apache Hadoop的发行版,它集成了众多大数据处理组件,如HDFS、MapReduce、Hive、Pig、HBase等,为企业级大数据分析提供了...
Installation 1.Refer to jscolor.js in your HTML page: <script type="text/javascript" src="jscolor/jscolor.js"></script> 2.Add a class to desired <input> tags: <input class="color">
, HCatalog, Pig, Hive, HBase, ZooKeeper and Ambari. Hortonworks is the major contributor of code and patches to many of these projects. These projects have been integrated and tested as part of the ...
在“Installing a Preconfigured Virtual Machine”以及“Detailed Installation”章节,作者指导读者如何安装和配置Hive环境。书中提到安装Java和Hadoop作为前提步骤,这也验证了Hive本身依赖于Java和Hadoop的生态...
该平台由 Apache Hadoop 项目的基本集合组成,包括 MapReduce、HDFS、HCatalog、Pig、Hive、HBase、ZooKeeper 和 Ambari。Hortonworks 数据平台旨在以一种非常快速、简单和经济有效的方式处理来自许多源和格式的数据...
- **HBase**: For HBase tables, Impala leverages the HBase client API to access data directly, providing faster query performance. #### Planning for Impala Deployment - **Requirements**: - **...
这些项目包括MapReduce、HDFS(Hadoop分布式文件系统)、HCatalog、Pig、Hive、HBase、ZooKeeper等。Hortonworks Data Platform(HDP)是基于Apache Hadoop构建的,一个完全开源且大规模可扩展的平台,用于存储、...
4. **选择要安装的服务**:选择需要安装的Hadoop、Hbase、LogSearch等服务。 5. **配置服务**:分别对Master、Slaves、Clients等节点进行配置。 6. **开始安装**:点击安装按钮,等待安装完成。 #### 七、Hadoop...