`

Hadoop2.7.1+Hbase1.2.1集群环境搭建(3)1.x和2.x都支持的集群安装方式

阅读更多
(1)hadoop2.7.1源码编译 http://aperise.iteye.com/blog/2246856
(2)hadoop2.7.1安装准备 http://aperise.iteye.com/blog/2253544
(3)1.x和2.x都支持的集群安装 http://aperise.iteye.com/blog/2245547
(4)hbase安装准备 http://aperise.iteye.com/blog/2254451
(5)hbase安装 http://aperise.iteye.com/blog/2254460
(6)snappy安装 http://aperise.iteye.com/blog/2254487
(7)hbase性能优化 http://aperise.iteye.com/blog/2282670
(8)雅虎YCSBC测试hbase性能测试 http://aperise.iteye.com/blog/2248863
(9)spring-hadoop实战 http://aperise.iteye.com/blog/2254491
(10)基于ZK的Hadoop HA集群安装  http://aperise.iteye.com/blog/2305809

1.什么是Hadoop

1.1 Hadoop历史渊源

        Doug Cutting是Apache Lucene创始人, Apache Nutch项目开始于2002年,Apache Nutch是Apache Lucene项目的一部分。2005年Nutch所有主要算法均完成移植,用MapReduce和NDFS来运行。2006年2月,Nutch将MapReduce和NDFS移出Nutch形成Lucene一个子项目,命名Hadoop。

Hadoop不是缩写,而是虚构名。项目创建者Doug Cutting解释Hadoop的得名:“这个名字是我孩子给一个棕黄色的大象玩具命名的。我的命名标准就是简短,容易发音和拼写,没有太多的意义,并且不会被用于别处。小孩子恰恰是这方面的高手。

 

1.2 狭义的Hadoop

        个人认为,狭义的Hadoop指Apache下Hadoop子项目,该项目由以下模块组成:

  • Hadoop Common: 一系列组件和接口,用于分布式文件系统和通用I/O
  • Hadoop Distributed File System (HDFS?): 分布式文件系统
  • Hadoop YARN: 一个任务调调和资源管理框架
  • Hadoop MapReduce: 分布式数据处理编程模型,用于大规模数据集并行运算

        狭义的Hadoop主要解决三个问题,提供HDFS解决分布式存储问题,提供YARN解决任务调度和资源管理问题,提供一种编程模型,让开发者可以进来编写代码做离线大数据处理

 

1.3 广义的Hadoop

        个人认为,广义的Hadoop指整个Hadoop生态圈,生态圈中包含各个子项目,每个子项目为了解决某种场合问题而生,主要组成如下图:


 

        

2.Hadoop集群部署两种集群部署方式

2.1 hadoop1.x和hadoop2.x都支持的namenode+secondarynamenode方式


 

2.2 仅hadoop2.x支持的active namenode+standby namenode方式


 

2.3 Hadoop官网关于集群方式介绍

        1)单机Hadoop环境搭建

        http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SingleCluster.html

        2)集群方式

            集群方式一(hadoop1.x和hadoop2.x都支持的namenode+secondarynamenode方式)

            http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/ClusterSetup.html

 

            集群方式二(仅hadoop2.x支持的active namenode+standby namenode方式,也叫HADOOP HA方式),这种方式又将HDFS的HA和YARN的HA单独分开讲解

                     HDFS HA(zookeeper+journalnode)http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

                     HDFS HA(zookeeper+NFS)http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailability

                     YARN HA(zookeeper)http://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html

 

        生产环境多采用HDFS(zookeeper+journalnode)(active NameNode+standby NameNode+JournalNode+DFSZKFailoverController+DataNode)+YARN(zookeeper)(active ResourceManager+standby ResourceManager+NodeManager)方式,这里我讲解的是hadoop1.x和hadoop2.x都支持的namenode+secondarynamenode方式,这种方式主要用于学习实践,因为它需要的机器台数低但存在namenode单节点问题

 

3.Hadoop安装

3.1 所需软件包

  1. JavaTM1.7.x,必须安装,建议选择Sun公司发行的Java版本。经验证目前hadoop2.7.1暂不支持jdk1.6,这里用的是jdk1.7,下载地址为:http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
  2. ssh 必须安装并且保证 sshd一直运行,以便用Hadoop 脚本管理远端Hadoop守护进程。
  3. hadoop安装包下载地址:http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz

3.2 环境

  1. 操作系统: Red Hat Enterprise Linux Server release 5.8 (Tikanga)
  2. 主从服务器:Master 192.168.181.66  Slave1 192.168.88.21   Slave2 192.168.88.22

3.3 SSH免密码登录

    首先需要在linux上安装SSH(因为Hadoop要通过SSH链接集群其他机器进行读写操作),请自行安装。Hadoop需要通过SSH登录到各个节点进行操作,我用的是hadoop用户,每台服务器都生成公钥,再合并到authorized_keys。

    1.CentOS默认没有启动ssh无密登录,去掉/etc/ssh/sshd_config其中2行的注释,每台服务器都要设置。       修改前:

#RSAAuthentication yes
#PubkeyAuthentication yes

    修改后(修改后需要执行service sshd restart):

RSAAuthentication yes
PubkeyAuthentication yes

    后续请参考http://aperise.iteye.com/blog/2253544

 

3.4 安装JDK

    Hadoop2.7需要JDK7,JDK1.6在Hadoop启动时候会报如下错误

[hadoop@nmsc1 bin]# ./hdfs namenode -format
Exception in thread "main" java.lang.UnsupportedClassVersionError: org/apache/hadoop/hdfs/server/namenode/NameNode : Unsupported major.minor version 51.0
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
        at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: org.apache.hadoop.hdfs.server.namenode.NameNode.  Program will exit.

     1.下载jdk-7u65-linux-x64.gz放置于/opt/java/jdk-7u65-linux-x64.gz.

     2.解压,输入命令tar -zxvf jdk-7u65-linux-x64.gz.

     3.编辑/etc/profile,在文件末尾追加如下内容

export JAVA_HOME=/opt/java/jdk1.7.0_65
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin

     4.使配置生效,输入命令,source /etc/profile

     5.输入命令java -version,检查JDK环境是否配置成功。

[hadoop@nmsc2 java]# java -version
java version "1.7.0_65"
Java(TM) SE Runtime Environment (build 1.7.0_65-b17)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
[hadoop@nmsc2 java]#

 

 3.5 安装Hadoop2.7

    1.只在master上下载hadoop-2.7.1.tar.gz并放置于/opt/hadoop-2.7.1.tar.gz.

    2.解压,输入命令tar -xzvf hadoop-2.7.1.tar.gz.

    3.在/home目录下创建数据存放的文件夹,hadoop/tmp、hadoop/hdfs、hadoop/hdfs/data、hadoop/hdfs/name.

    4.配置/opt/hadoop-2.7.1/etc/hadoop目录下的core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at
    http://www.apache.org/licenses/LICENSE-2.0
  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
	<!-- 开启垃圾回收站功能,HDFS文件删除后先进入垃圾回收站,垃圾回收站最长保留数据时间为1天,超过一天后就删除 --> 
	<property>
		<name>fs.trash.interval</name>
		<value>1440</value>
	</property>
    <property>
        <!--NameNode的URI。格式:【hdfs://主机名/】-->
        <name>fs.defaultFS</name>
        <value>hdfs://192.168.181.66:9000</value>
    </property>
    <property>
        <!--hadoop.tmp.dir 是hadoop文件系统依赖的基础配置,很多路径都依赖它。如果hdfs-site.xml中不配置namenode和datanode的存放位置,默认就放在这个路径中-->
        <name>hadoop.tmp.dir</name>
        <value>file:/home/hadoop/tmp</value>
    </property>
    <property>
        <!--hadoop访问文件的IO操作都需要通过代码库。因此,在很多情况下,io.file.buffer.size都被用来设置SequenceFile中用到的读/写缓存大小。不论是对硬盘或者是网络操作来讲,较大的缓存都可以提供更高的数据传输,但这也就意味着更大的内存消耗和延迟。这个参数要设置为系统页面大小的倍数,以byte为单位,默认值是4KB,一般情况下,可以设置为64KB(65536byte),这里设置128K-->
        <name>io.file.buffer.size</name>
        <value>131072</value>
    </property>
<property>
  <name>dfs.namenode.handler.count</name>
  <value>200</value>
  <description>The number of server threads for the namenode.</description>
</property>
<property>
  <name>dfs.datanode.handler.count</name>
  <value>100</value>
  <description>The number of server threads for the datanode.</description>
</property>
</configuration>
     5.配置/opt/hadoop-2.7.1/etc/hadoop目录下的hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at
    http://www.apache.org/licenses/LICENSE-2.0
  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
    <property>
        <!--dfs.namenode.name.dir - 这是NameNode结点存储hadoop文件系统信息的本地系统路径。这个值只对NameNode有效,DataNode并不需要使用到它。-->
        <name>dfs.namenode.name.dir</name>
        <value>file:/home/hadoop/hdfs/name</value>
    </property>
    <property>
        <!--dfs.datanode.data.dir - 这是DataNode结点被指定要存储数据的本地文件系统路径。DataNode结点上的这个路径没有必要完全相同,因为每台机器的环境很可能是不一样的。但如果每台机器上的这个路径都是统一配置的话,会使工作变得简单一些。默认的情况下,它的值-->
        <name>dfs.datanode.data.dir</name>
        <value>file:/home/hadoop/hdfs/data</value>
    </property>
    <property>
        <!--dfs.replication -它决定着系统里面的文件块的数据备份个数。对于一个实际的应用,它应该被设为3(这个数字并没有上限,但更多的备份可能并没有作用,而且会占用更多的空间)。少于三个的备份,可能会影响到数据的可靠性(系统故障时,也许会造成数据丢失)-->
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <property>
        <!--配置hadoop做实验,在网上看了许多有关hadoop的配置,但是这些配置多数是将namenode和secondaryNameNode配置在同一台计算机上,这种配置方法如果是做做实验的还可以,如果应用到实际中,存在较大风险,如果存放namenode的主机出现问题,整个文件系统将被破坏,严重的情况是所有文件都丢失。配置hadoop2.7将namenode和secondaryNameNode配置在不同的机器上,这样的实用价值更大-->
        <name>dfs.namenode.secondary.http-address</name>
        <value>192.168.181.66:9001</value>
    </property>
    <property>
    <!--namenode的hdfs-site.xml是必须将dfs.webhdfs.enabled属性设置为true,否则就不能使用webhdfs的LISTSTATUS、LISTFILESTATUS等需要列出文件、文件夹状态的命令,因为这些信息都是由namenode来保存的。
访问namenode的hdfs使用50070端口,访问datanode的webhdfs使用50075端口。访问文件、文件夹信息使用namenode的IP和50070端口,访问文件内容或者进行打开、上传、修改、下载等操作使用datanode的IP和50075端口。要想不区分端口,直接使用namenode的IP和端口进行所有的webhdfs操作,就需要在所有的datanode上都设置hefs-site.xml中的dfs.webhdfs.enabled为true。-->
    <name>dfs.webhdfs.enabled</name>
    <value>true</value>
    </property>
  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>107374182400</value>
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
    </description>
  </property>
<property>
  <!--这里设置HDFS客户端最大超时时间,尽量改大,后期hbase经常会因为该问题频繁宕机-->
  <name>dfs.client.socket-timeout</name>
  <value>600000/value>
</property>
<property>
  <!--这里设置Hadoop允许打开最大文件数,默认4096,不设置的话会提示xcievers exceeded错误-->
  <name>dfs.datanode.max.transfer.threads</name>
  <value>409600</value>
</property>  
</configuration>
     6.配置/opt/hadoop-2.7.1/etc/hadoop目录下的mapred-site.xml.template另存为mapred-site.xml ,修改内容如下:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at
    http://www.apache.org/licenses/LICENSE-2.0
  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
    <property>
        <!--新框架支持第三方 MapReduce 开发框架以支持如 SmartTalk/DGSG 等非 Yarn 架构,注意通常情况下这个配置的值都设置为 Yarn,如果没有配置这项,那么提交的 Yarn job 只会运行在 locale 模式,而不是分布式模式-->
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <!--Hadoop自带了一个历史服务器,可以通过历史服务器查看已经运行完的Mapreduce作业记录,比如用了多少个Map、用了多少个Reduce、作业提交时间、作业启动时间、作业完成时间等信息。默认情况下,Hadoop历史服务器是没有启动的,我们可以通过下面的命令来启动Hadoop历史服务器
参数是在mapred-site.xml文件中进行配置,mapreduce.jobhistory.address和mapreduce.jobhistory.webapp.address默认的值分别是0.0.0.0:10020和0.0.0.0:19888,大家可以根据自己的情况进行相应的配置,参数的格式是host:port。配置完上述的参数之后,重新启动Hadoop jobhistory,这样我们就可以在mapreduce.jobhistory.webapp.address参数配置的主机上对Hadoop历史作业情况经行查看-->
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>192.168.181.66:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>192.168.88.21:19888</value>
    </property>
</configuration>
7.配置/opt/hadoop-2.7.1/etc/hadoop目录下的yarn-site.xml ,修改内容如下:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <!--Shuffle service 需要加以设置的Map Reduce的应用程序服务。-->
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
        <!--新框架中 NodeManager 与 RM 通信的接口地址-->
        <name>yarn.resourcemanager.address</name>
        <value>192.168.181.66:8032</value>
    </property>
    <property>
        <!--NodeManger 需要知道 RM 主机的 scheduler 调度服务接口地址-->
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>192.168.181.66:8030</value>
    </property>
    <property>
        <!--新框架中 NodeManager 需要向 RM 报告任务运行状态供 Resouce 跟踪,因此 NodeManager 节点主机需要知道 RM 主机的 tracker 接口地址-->
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>192.168.181.66:8031</value>
    </property>
    <property>
        <!--ResourceManager 对管理员暴露的访问地址。管理员通过该地址向RM发送管理命令等。默认值:${yarn.resourcemanager.hostname}:8033-->
        <name>yarn.resourcemanager.admin.address</name>
        <value>192.168.181.66:8033</value>
    </property>
    <property>
        <!--参数解释:ResourceManager对外web ui地址。用户可通过该地址在浏览器中查看集群各类信息。默认值:${yarn.resourcemanager.hostname}:8088-->
        <name>yarn.resourcemanager.webapp.address</name>
        <value>192.168.181.66:8088</value>
    </property>
    <property>
        <!--参数解释:NodeManager总的可用物理内存。注意,该参数是不可修改的,一旦设置,整个运行过程中不 可动态修改。另外,该参数的默认值是8192MB,即使你的机器内存不够8192MB,YARN也会按照这些内存来使用(傻不傻?),因此,这个值通过一 定要配置。不过,Apache已经正在尝试将该参数做成可动态修改的。默认值:8192(8GB)当该值配置小于1024(1GB)时,NM是无法启动的!会报错:
NodeManager from  slavenode2 doesn't satisfy minimum allocations, Sending SHUTDOWN signal to the NodeManager.-->
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>2048</value>
    </property>
</configuration>
     8.配置/home/hadoop/hadoop-2.7.1/etc/hadoop目录下hadoop-env.sh、yarn-env.sh的JAVA_HOME,不设置的话,启动不了
export JAVA_HOME=/opt/java/jdk1.7.0_65
     9.配置/home/hadoop/hadoop-2.7.1/etc/hadoop目录下的slaves,删除默认的localhost,增加1个从节点
#localhost
192.168.88.22
192.168.88.21
     10.将配置好的Hadoop复制到各个节点对应位置上,通过scp传送
chmod -R 777 /home/hadoop
chmod -R 777 /opt/hadoop-2.7.1
scp -r /opt/hadoop-2.7.1 192.168.88.22:/opt/
scp -r /home/hadoop 192.168.88.22:/home
scp -r /opt/hadoop-2.7.1 192.168.88.21:/opt/
scp -r /home/hadoop 192.168.88.21:/home
     11.在Master服务器启动hadoop,从节点会自动启动,进入/home/hadoop/hadoop-2.7.1目录
        (1)初始化,输入命令./hdfs namenode -format

 

[hadoop@nmsc2 bin]# cd /opt/hadoop-2.7.1/bin/
[hadoop@nmsc2 bin]# ls
container-executor  hadoop  hadoop.cmd  hdfs  hdfs.cmd  mapred  mapred.cmd  rcc  test-container-executor  yarn  yarn.cmd
[hadoop@nmsc2 bin]# ./hdfs namenode -format
15/09/23 16:03:17 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = nmsc2/127.0.0.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.7.1
STARTUP_MSG:   classpath = /opt/hadoop-2.7.1/etc/hadoop:/opt/hadoop-2.7.1/share/hadoop/common/lib/httpclient-4.2.5.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/slf4j-api-1.7.10.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jersey-json-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/xz-1.0.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-logging-1.1.3.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-cli-1.2.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-compress-1.4.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/htrace-core-3.1.0-incubating.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/curator-framework-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jersey-server-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/curator-client-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/avro-1.7.4.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/gson-2.2.4.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/curator-recipes-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/mockito-all-1.8.5.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jsr305-3.0.0.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/paranamer-2.3.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/guava-11.0.2.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jsch-0.1.42.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/api-asn1-api-1.0.0-M20.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jsp-api-2.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/servlet-api-2.5.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-lang-2.6.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/hadoop-annotations-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/activation-1.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-codec-1.4.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-io-2.4.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/httpcore-4.2.5.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/xmlenc-0.52.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-collections-3.2.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-httpclient-3.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/netty-3.6.2.Final.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-net-3.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-math3-3.1.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jersey-core-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/java-xmlbuilder-0.4.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/zookeeper-3.4.6.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-digester-1.8.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jettison-1.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/junit-4.11.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jetty-util-6.1.26.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/stax-api-1.0-2.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-configuration-1.6.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/api-util-1.0.0-M20.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/hadoop-auth-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/log4j-1.2.17.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jets3t-0.9.0.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/asm-3.2.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/hamcrest-core-1.3.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jetty-6.1.26.jar:/opt/hadoop-2.7.1/share/hadoop/common/hadoop-nfs-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/hadoop-common-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/hadoop-common-2.7.1-tests.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/xml-apis-1.3.04.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/htrace-core-3.1.0-incubating.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/jersey-server-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/jsr305-3.0.0.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/guava-11.0.2.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/commons-io-2.4.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/netty-3.6.2.Final.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/netty-all-4.0.23.Final.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/jackson-core-asl-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/jetty-util-6.1.26.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/asm-3.2.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/xercesImpl-2.9.1.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/jetty-6.1.26.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/leveldbjni-all-1.8.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/hadoop-hdfs-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/hadoop-hdfs-2.7.1-tests.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/hadoop-hdfs-nfs-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jersey-json-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/xz-1.0.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/commons-logging-1.1.3.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/commons-cli-1.2.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/commons-compress-1.4.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jackson-mapper-asl-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jersey-server-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/guice-3.0.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jsr305-3.0.0.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/guava-11.0.2.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/zookeeper-3.4.6-tests.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/aopalliance-1.0.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/servlet-api-2.5.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/commons-lang-2.6.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/javax.inject-1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/activation-1.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/commons-codec-1.4.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/commons-io-2.4.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/commons-collections-3.2.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/netty-3.6.2.Final.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jersey-core-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jaxb-api-2.2.2.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/zookeeper-3.4.6.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jettison-1.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jackson-core-asl-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jetty-util-6.1.26.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jersey-client-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/stax-api-1.0-2.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jaxb-impl-2.2.3-1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/log4j-1.2.17.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jackson-xc-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jackson-jaxrs-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/asm-3.2.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jetty-6.1.26.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/leveldbjni-all-1.8.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-common-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-server-tests-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-server-common-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-registry-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-api-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-client-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/xz-1.0.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/jersey-guice-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/commons-compress-1.4.1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/jersey-server-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/avro-1.7.4.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/guice-3.0.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/hadoop-annotations-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/javax.inject-1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/commons-io-2.4.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/netty-3.6.2.Final.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/jersey-core-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/jackson-core-asl-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/junit-4.11.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/protobuf-java-2.5.0.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/snappy-java-1.0.4.1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/asm-3.2.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/hamcrest-core-1.3.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/leveldbjni-all-1.8.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.1-tests.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.7.1.jar:/contrib/capacity-scheduler/*.jar
STARTUP_MSG:   build = https://git-wip-us.apache.org/repos/asf/hadoop.git -r 15ecc87ccf4a0228f35af08fc56de536e6ce657a; compiled by 'jenkins' on 2015-06-29T06:04Z
STARTUP_MSG:   java = 1.7.0_65
************************************************************/
15/09/23 16:03:17 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
15/09/23 16:03:17 INFO namenode.NameNode: createNameNode [-format]
15/09/23 16:03:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Formatting using clusterid: CID-695216ce-3c4e-47e4-a31f-24a7e40d8791
15/09/23 16:03:18 INFO namenode.FSNamesystem: No KeyProvider found.
15/09/23 16:03:18 INFO namenode.FSNamesystem: fsLock is fair:true
15/09/23 16:03:18 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
15/09/23 16:03:18 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
15/09/23 16:03:18 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
15/09/23 16:03:18 INFO blockmanagement.BlockManager: The block deletion will start around 2015 Sep 23 16:03:18
15/09/23 16:03:18 INFO util.GSet: Computing capacity for map BlocksMap
15/09/23 16:03:18 INFO util.GSet: VM type       = 64-bit
15/09/23 16:03:18 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB
15/09/23 16:03:18 INFO util.GSet: capacity      = 2^21 = 2097152 entries
15/09/23 16:03:18 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
15/09/23 16:03:18 INFO blockmanagement.BlockManager: defaultReplication         = 1
15/09/23 16:03:18 INFO blockmanagement.BlockManager: maxReplication             = 512
15/09/23 16:03:18 INFO blockmanagement.BlockManager: minReplication             = 1
15/09/23 16:03:18 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
15/09/23 16:03:18 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks  = false
15/09/23 16:03:18 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
15/09/23 16:03:18 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
15/09/23 16:03:18 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
15/09/23 16:03:18 INFO namenode.FSNamesystem: fsOwner             = hadoop(auth:SIMPLE)
15/09/23 16:03:18 INFO namenode.FSNamesystem: supergroup          = supergroup
15/09/23 16:03:18 INFO namenode.FSNamesystem: isPermissionEnabled = true
15/09/23 16:03:18 INFO namenode.FSNamesystem: HA Enabled: false
15/09/23 16:03:18 INFO namenode.FSNamesystem: Append Enabled: true
15/09/23 16:03:18 INFO util.GSet: Computing capacity for map INodeMap
15/09/23 16:03:18 INFO util.GSet: VM type       = 64-bit
15/09/23 16:03:18 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB
15/09/23 16:03:18 INFO util.GSet: capacity      = 2^20 = 1048576 entries
15/09/23 16:03:18 INFO namenode.FSDirectory: ACLs enabled? false
15/09/23 16:03:18 INFO namenode.FSDirectory: XAttrs enabled? true
15/09/23 16:03:18 INFO namenode.FSDirectory: Maximum size of an xattr: 16384
15/09/23 16:03:18 INFO namenode.NameNode: Caching file names occuring more than 10 times
15/09/23 16:03:18 INFO util.GSet: Computing capacity for map cachedBlocks
15/09/23 16:03:18 INFO util.GSet: VM type       = 64-bit
15/09/23 16:03:18 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB
15/09/23 16:03:18 INFO util.GSet: capacity      = 2^18 = 262144 entries
15/09/23 16:03:18 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
15/09/23 16:03:18 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
15/09/23 16:03:18 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
15/09/23 16:03:18 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
15/09/23 16:03:18 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
15/09/23 16:03:18 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
15/09/23 16:03:18 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
15/09/23 16:03:18 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
15/09/23 16:03:18 INFO util.GSet: Computing capacity for map NameNodeRetryCache
15/09/23 16:03:18 INFO util.GSet: VM type       = 64-bit
15/09/23 16:03:18 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB
15/09/23 16:03:18 INFO util.GSet: capacity      = 2^15 = 32768 entries
15/09/23 16:03:18 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1469452028-127.0.0.1-1442995398776
15/09/23 16:03:18 INFO common.Storage: Storage directory /home/hadoop/dfs/name has been successfully formatted.
15/09/23 16:03:19 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
15/09/23 16:03:19 INFO util.ExitUtil: Exiting with status 0
15/09/23 16:03:19 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at nmsc2/127.0.0.1
************************************************************/
[hadoop@nmsc2 bin]#
         (2)全部启动./start-all.sh,也可以分开./start-dfs.sh、./start-yarn.sh
[hadoop@nmsc1 bin]# cd /opt/hadoop-2.7.1/sbin/
[hadoop@nmsc1 sbin]# ./start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
15/09/23 16:48:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [192.168.88.21]
192.168.88.21: starting namenode, logging to /opt/hadoop-2.7.1/logs/hadoop-hadoop-namenode-nmsc1.out
192.168.88.22: starting datanode, logging to /opt/hadoop-2.7.1/logs/hadoop-hadoop-datanode-nmsc2.out
Starting secondary namenodes [192.168.88.21]
192.168.88.21: starting secondarynamenode, logging to /opt/hadoop-2.7.1/logs/hadoop-hadoop-secondarynamenode-nmsc1.out
15/09/23 16:48:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
starting yarn daemons
resourcemanager running as process 5881. Stop it first.
192.168.88.22: starting nodemanager, logging to /opt/hadoop-2.7.1/logs/yarn-hadoop-nodemanager-nmsc2.out
[hadoop@nmsc1 sbin]#

         (3)停止的话,输入命令,./stop-all.sh

         关于Hadoop相关shell命令调用关系,见下图:



 

         (4)输入命令,jps,可以看到相关信息

[hadoop@nmsc1 sbin]# jps
14201 Jps
5881 ResourceManager
13707 NameNode
13924 SecondaryNameNode
[hadoop@nmsc1 sbin]#
     12.Web访问,要先开放端口或者直接关闭防火墙

        (1)输入命令

             systemctl stop firewalld.service(centos) 

             chkconfig iptables on(redhat 防火墙开启)

             chkconfig iptables off (redhat 防火墙关闭)  

        (2)浏览器打开http://192.168.181.66:8088/(ResourceManager对外web ui地址。用户可通过该地址在浏览器中查看集群各类信息)

       

     (3)浏览器打开http://192.168.181.66:50070/           (NameNode)


      (4)浏览器打开http://192.168.181.66:9001          (备用第二个NameNode)

         

 

    13.安装完成。这只是大数据应用的开始,之后的工作就是,结合自己的情况,编写程序调用Hadoop的接口,发挥hdfs、mapreduce的作用

 

 3.6 遇到的问题

    1)hadoop 启动的时候datanode报错 Problem connecting to server

     解决办法是修改/etc/hosts,详见http://blog.csdn.net/renfengjun/article/details/25320043

 

    2)启动yarn时候nodemanager无法启动,报错doesn't satisfy minimum allocations, Sending SHUTDOWN signal原因是yarn-site.xml中yarn.nodemanager.resource.memory-mb配置的nodemanager可使用的内存过低,最低不能小于1024M

 

    3)hbase报错:util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using built

     解决办法:

    sudo rm -r /opt/hbase-1.2.1/lib/native

    sudo mkdir /opt/hbase-1.2.1/lib/native

    sudo mkdir /opt/hbase-1.2.1/lib/native/Linux-amd64-64

    sudo cp -r /opt/hadoop-2.7.1/lib/native/* /opt/hbase-1.2.1/lib/native/Linux-amd64-64/

 

    4)在hbase创建表时指定压缩方式报错”Compression algorithm 'snappy' previously failed test. Set hbase.table.sanity.checks to false at conf“

      建表语句为:create 'signal1', { NAME => 'info', COMPRESSION => 'SNAPPY' }, SPLITS => ['00000000000000000000000000000000','10000000000000000000000000000000','20000000000000000000000000000000','30000000000000000000000000000000','40000000000000000000000000000000','50000000000000000000000000000000','60000000000000000000000000000000','70000000000000000000000000000000','80000000000000000000000000000000','90000000000000000000000000000000']

       解决办法是在hbase-site.xml中增加配置

<property>

<name>hbase.table.sanity.checks</name>

<value>false</value>

</property>

 

        5)nodemanager无法启动,报错如下

2016-01-26 18:45:10,891 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registering with RM using containers :[]
2016-01-26 18:45:11,778 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Unexpected error starting NodeStatusUpdater
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved SHUTDOWN signal from Resourcemanager ,Registration of NodeManager failed, Message from ResourceManager: NodeManager from  slavery01 doesn't satisfy minimum allocations, Sending SHUTDOWN signal to the NodeManager.
        at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:270)
        at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:196)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:271)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:486)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:533)
2016-01-26 18:45:11,781 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl failed in state STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved SHUTDOWN signal from Resourcemanager ,Registration of NodeManager failed, Message from ResourceManager: NodeManager from  slavery01 doesn't satisfy minimum allocations, Sending SHUTDOWN signal to the NodeManager.

    解决办法

<property>
       <!--该值不能小于1024-->
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>2048</value>
    </property>
</configuration>

   

      6) WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

       解决办法  http://zhidao.baidu.com/link?url=_cOK3qt3yzgWwifuMGuZhSOTUyKTiYZfyHr3Xd1id345B9SvSIGsJ-mGLDsk4QseWmBnY5LjxgwHwjKQ4UTFtm8IV6J2im4QfSRh__MhzpW

      7) 很多人单机版Hadoop遇到错误Hadoop hostname: Unknown host

      解决办法:先ifconfig查看本机IP和用hostname查看主机名,比如为 192.168.55.128和hadoop,那就在/etc/hosts增加一条记录192.168.55.128 hadoop,然后同步修改core-site.xml和mapred-site.xml中localhost为hadoop,修改完后执行./hdfs namenode format,执行完后sbin/start-all.sh就可以了

3.7 网上找到一网友关于hadoop2.7+hbase1.0+hive1.2安装的总结,详见附件“我学大数据技术(hadoop2.7+hbase1.0+hive1.2).pdf”

     另外写的比较好的文章有:

      Hadoop2.7.1分布式安装-准备篇      http://my.oschina.net/allman90/blog/485352

      Hadoop2.7.1分布式安装-安装篇      http://my.oschina.net/allman90/blog/486117

3.8 常用shell

 

#显示hdfs指定路径/user/下的文件和文件夹
bin/hdfs dfs –ls  /user/                  
#将本地文件/opt/smsmessage.txt上传到hdfs的目录/user/下
bin/hdfs dfs –put /opt/smsmessage.txt  /user/    
#将hdfs上的文件/user/smsmessage.txt下载到本地/opt/目录下
bin/hdfs dfs -get /user/smsmessage.txt /opt/    
#查看hdfs中的文本文件/opt/smsmessage.txt内容
bin/hdfs dfs  –cat /opt/smsmessage.txt     
#查看hdfs中的/user/smsmessage.txt文件内容
bin/hdfs dfs  –text /user/smsmessage.txt 
#将hdfs上的文件/user/smsmessage.txt删除
bin/hdfs dfs –rm /user/smsmessage.txt            
#在执行balance 操作之前,可设置一下balance 操作占用的网络带宽,设置10M,10*1024*1024
bin/hdfs dfsadmin -setBalancerBandwidth <bandwidth in bytes per second> 



#执行Hadoop自带Wordcount例子,/input目录必须存在于HDFS上,且其下有文件,/output目录是输出目录,mapreduce会自动创建
bin/hadoop jar /opt/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /input /output

#用这个命令可以检查整个文件系统的健康状况,但是要注意它不会主动恢复备份缺失的block,这个是由NameNode单独的线程异步处理的。
cd /opt/hadoop-2.7.1/bin
./hdfs  fsck /
#Hadoop设置根目录/下的备份数
cd /opt/hadoop-2.7.1/bin
./hadoop fs -setrep -R 2 /
#也可以使用如下命令
./hdfs dfs -setrep -R 2 /
#打印出了这个文件每个block的详细信息包括datanode的机架信息。
cd /opt/hadoop-2.7.1/bin
bin/hadoop  fsck /user/distribute-hadoop-boss/tmp/pgv/20090813/1000000103/input/JIFEN.QQ.COM.2009-08-13-18.30 -files -blocks -locations  -racks
#查看配置文件hdfs-site.xml中配置项dfs.client.block.write.replace-datanode-on-failure.enable和dfs.client.block.write.replace-datanode-on-failure.policy配置的值
cd /opt/hadoop-2.7.1/bin
./hdfs getconf -confKey dfs.client.block.write.replace-datanode-on-failure.enable
./hdfs getconf -confKey dfs.client.block.write.replace-datanode-on-failure.policy

#启动HDFS,该命令会读取slaves和配置文件,将所有节点HDFS相关服务启动
cd /opt/hadoop-2.7.1/sbin
./start-dfs.sh
#启动yarn,该命令会读取slaves和配置文件,将所有节点YARN相关服务启动
cd /opt/hadoop-2.7.1/sbin
./start-yarn.sh
#只在单机上启动服务namenode、secondarynamenode、journalnode、datanode
./hadoop-daemon.sh start/stop namenode
./hadoop-daemon.sh start/stop secondarynamenode
./hadoop-daemon.sh start/stop journalnode
./hadoop-daemon.sh start/stop datanode

#查看是否在安全模式  
[hadoop@nmsc2 bin]$ cd /opt/hadoop-2.7.1/bin  
[hadoop@nmsc2 bin]$ ./hdfs dfsadmin -safemode get  
Safe mode is OFF  
[hadoop@nmsc2 bin]$   
#离开安全模式  
[hadoop@nmsc2 bin]$ cd /opt/hadoop-2.7.1/bin  
[hadoop@nmsc2 bin]$ ./hdfs dfsadmin -safemode leave  
Safe mode is OFF  
[hadoop@nmsc2 bin]$
#查看某些参数配置值
[hadoop@nmsc1 bin]$ cd /opt/hadoop-2.7.1/bin
[hadoop@nmsc1 bin]$ ./hdfs getconf -confKey dfs.datanode.handler.count
100
[hadoop@nmsc1 bin]$ ./hdfs getconf -confKey dfs.namenode.handler.count
200
[hadoop@nmsc1 bin]$ ./hdfs getconf -confKey dfs.namenode.avoid.read.stale.datanode
false
[hadoop@nmsc1 bin]$ ./hdfs getconf -confKey dfs.namenode.avoid.write.stale.datanode
false
[hadoop@nmsc1 bin]$ 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

  • 大小: 46.4 KB
  • 大小: 26.4 KB
  • 大小: 26 KB
  • 大小: 105.4 KB
  • 大小: 23.8 KB
  • 大小: 165.4 KB
  • 大小: 123.8 KB
分享到:
评论

相关推荐

    Hadoop2.7.1+Hbase1.2.1集群环境搭建(7)hbase 性能优化

    本篇将详细阐述如何在Hadoop 2.7.1环境下搭建HBase 1.2.1集群,并进行性能优化,以提升系统效率。 首先,我们需要了解Hadoop和HBase的基本概念。Hadoop是基于分布式文件系统HDFS(Hadoop Distributed File System)...

    Hadoop2.2+Zookeeper3.4.5+HBase0.96集群环境搭建

    Hadoop2.2+Zookeeper3.4.5+HBase0.96集群环境搭建是大数据处理和存储的重要组件,本文档将指导用户从零开始搭建一个完整的Hadoop2.2+Zookeeper3.4.5+HBase0.96集群环境。 硬件和软件要求 为搭建Hadoop2.2+...

    hadoop2.7.1+zk3.5+hbase2.1+phoenix 安装部署环境打包

    本压缩包提供了这些组件的安装部署资源,便于快速搭建一个完整的Hadoop2.7.1、ZK3.5、HBase2.1和Phoenix5.1.0的基础环境。 首先,Hadoop是Apache开源项目,它提供了分布式文件系统(HDFS)和MapReduce计算框架,...

    hadoop2.7.1+hbase2.1.4+zookeeper3.6.2.rar

    标题 "hadoop2.7.1+hbase2.1.4+zookeeper3.6.2.rar" 提供的信息表明这是一个包含Hadoop 2.7.1、HBase 2.1.4和ZooKeeper 3.6.2的软件集合。这个压缩包可能包含了这些分布式系统的安装文件、配置文件、文档以及其他...

    虚拟机环境下Hadoop2.7.1+HBase1.3.5安装配置手册 .docx

    ### Hadoop2.7.1 + HBase1.3.5 在 CentOS6.5 虚拟机环境下的安装配置指南 #### 准备工作 为了确保 Hadoop 和 HBase 的顺利安装,需要提前做好一系列准备工作,包括安装 VMware、设置虚拟机、配置 CentOS 操作系统等...

    hbase-1.2.1-bin.tar.gz.zip

    总之,HBase-1.2.1的二进制压缩包提供了在Hadoop环境下部署和运行HBase所需的所有组件,适用于处理大量非结构化或半结构化数据的场景,如实时分析、日志处理、物联网应用等。正确解压并配置后,用户可以充分利用...

    hadoop+hbase+zookeeper集群配置流程及文件

    在大数据处理领域,Hadoop、HBase和Zookeeper是三个至关重要的组件,它们共同构建了一个高效、可扩展的数据处理和存储环境。以下是关于这些技术及其集群配置的详细知识。 首先,Hadoop是一个开源的分布式计算框架,...

    nutch2.3+hbase0.94.14+hadoop1.2.1安装文档.

    nutch2.3+hbase0.94.14+hadoop1.2.1安装文档.txt )

    大数据Hadoop+HBase+Spark+Hive集群搭建教程(七月在线)1

    总的来说,搭建Hadoop、HBase、Spark和Hive集群涉及多个步骤,包括安装、配置、启动以及验证。整个过程需要对大数据处理原理有深入理解,并熟悉Linux系统的操作。完成后,这个集群可以处理大量数据,支持实时查询和...

    zookeeper+hadoop+hbase+hive(集成hbase)安装部署教程(超详细).docx

    3.集群能正常运行的条件是集群可节点宕机数应保证有超过集群机器总数一半的机器在运行,因此从经济和实用性来说,集群的节点一般是奇数个,本文部署4台机器,其容灾能力与部署3台机器一致,即只能宕机1台

    搭建hadoop单机版+hbase单机版+pinpoint整合springboot.zip

    搭建一个完整的Hadoop单机版、HBase单机版以及Pinpoint与SpringBoot的整合环境,需要对大数据处理框架和微服务监控有深入的理解。在这个过程中,我们将涉及到以下几个关键知识点: 1. **Hadoop单机版**:Hadoop是...

    hadoop+hbase集群搭建 详细手册

    本文将详细介绍如何搭建Hadoop+HBase集群,包括前提准备、机器集群结构分布、硬件环境、软件准备、操作步骤等。 一、前提准备 在搭建Hadoop+HBase集群之前,需要准备以下几个组件: 1. Hadoop:Hadoop是一个基于...

    Linux环境Hadoop2.6+Hbase1.2集群安装部署

    以上步骤涵盖了从环境准备、软件安装到集群配置的全部过程,遵循这些指导可以成功搭建一个完整的Hadoop+HBase集群。在实际操作过程中,可能还会遇到各种问题,比如网络配置、安全策略等,需要根据具体情况灵活调整。...

    hadoop+HBase+Kylin集群启动停止命令.txt

    hadoop+HBase+Kylin集群启动停止命令

    Hadoop+Spark+Hive+HBase+Oozie+Kafka+Flume+Flink+ES+Redash等详细安装部署

    在安装Hadoop时,通常需要配置集群环境,包括主节点和从节点,并确保所有节点之间的网络通信畅通。 Spark是大数据处理的另一个关键组件,它支持批处理、交互式查询(通过Spark SQL)、实时流处理(通过Spark ...

    Hadoop+Hbase+Spark+Hive搭建

    在本文档中,我们详细地介绍了Hadoop+Hbase+Spark+Hive的搭建过程,包括环境准备、主机设置、防火墙设置、环境变量设置、Hadoop安装、Hbase安装、Spark安装和Hive安装。本文档旨在指导读者从零开始搭建Hadoop+Hbase+...

    hadoop+zookeeper+hbase集群搭建配置说明

    在大数据处理领域,Hadoop、Zookeeper和HBase是三个非常关键的组件,它们共同构建了一个高效、可扩展的数据仓库集群。以下是对这三个组件及其在集群搭建中的配置说明的详细阐述。 1. Hadoop:Hadoop是Apache软件...

    spark环境安装(Hadoop HA+Hbase+phoneix+kafka+flume+zookeeper+spark+scala)

    本项目旨在搭建一套完整的Spark集群环境,包括Hadoop HA(高可用)、HBase、Phoenix、Kafka、Flume、Zookeeper以及Scala等多个组件的集成。这样的环境适用于大规模的数据处理与分析任务,能够有效地支持实时数据流...

    hadoop+hbase+hive集群搭建

    在构建Hadoop+HBase+Hive集群的过程中,涉及了多个关键步骤和技术要点,下面将对这些知识点进行详细的解析。 ### 1. 时间同步:NTP配置 在分布式系统中,时间的一致性至关重要,特别是在处理日志、事件排序以及...

Global site tag (gtag.js) - Google Analytics