
hadoop 2.5.2安装实录


1. prepare the virtual environment for the hadoop cluster.
you can choose Virtual Box or VM Wave. There are some issues with Vitural box in my labtop. So i choose VM 10.0.

then you need use the following software tools. you'd better prepare them well before you start to install the env.

maven linux version 3.11
jdk   1.7.72
protoc 2.5.0 re-compile hadoop 2.5.2

tar -xvf protobuf-2.5.0.tar.bz2 
cd protobuf-2.5.0 
./configure --prefix=/opt/xxxxx/protoc/ 
make && make install
yum install gcc 
yum intall gcc-c++
yum install make

yum install cmake 
yum install openssl-devel 
yum install ncurses-devel
if you haven't these tools, you more or less meet some compile problem.

2. install jdk, maven, and config maven.
I am in China, so the forgien maven central repository some time is not available, ro too slow for me. So i config a mirror maven server in China.

           <name>local private nexus</name> 
          <name>local private nexus</name> 

When every thing in above are ready. Next, you will download hadoop from apache offical site. be noted: download the src version. I use the version 2.5.2

mvn clean package -Pdist,native -DskipTests -Dtar

the build process will last 30-60 mins based on you PC.

if one of the maven task fails, you need build it manually to save time.

at last,you will see the sucessfull screen liking below.

     [exec] $ tar cf hadoop-2.5.2.tar hadoop-2.5.2
     [exec] $ gzip -f hadoop-2.5.2.tar
     [exec] Hadoop dist tar available at: /root/hadoopsrc/srcdir/hadoop-2.5.2-src/hadoop-dist/target/hadoop-2.5.2.tar.gz
[INFO] Executed tasks
[INFO] --- maven-javadoc-plugin:2.8.1:jar (module-javadocs) @ hadoop-dist ---
[INFO] Building jar: /root/hadoopsrc/srcdir/hadoop-2.5.2-src/hadoop-dist/target/hadoop-dist-2.5.2-javadoc.jar
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] Apache Hadoop Main ................................ SUCCESS [2.414s]
[INFO] Apache Hadoop Project POM ......................... SUCCESS [1.719s]
[INFO] Apache Hadoop Annotations ......................... SUCCESS [5.243s]
[INFO] Apache Hadoop Assemblies .......................... SUCCESS [0.433s]
[INFO] Apache Hadoop Project Dist POM .................... SUCCESS [3.172s]
[INFO] Apache Hadoop Maven Plugins ....................... SUCCESS [6.075s]
[INFO] Apache Hadoop MiniKDC ............................. SUCCESS [5.361s]
[INFO] Apache Hadoop Auth ................................ SUCCESS [6.530s]
[INFO] Apache Hadoop Auth Examples ....................... SUCCESS [5.012s]
[INFO] Apache Hadoop Common .............................. SUCCESS [4:47.964s]
[INFO] Apache Hadoop NFS ................................. SUCCESS [12.655s]
[INFO] Apache Hadoop Common Project ...................... SUCCESS [0.097s]
[INFO] Apache Hadoop HDFS ................................ SUCCESS [8:59.599s]
[INFO] Apache Hadoop HttpFS .............................. SUCCESS [53.998s]
[INFO] Apache Hadoop HDFS BookKeeper Journal ............. SUCCESS [11.246s]
[INFO] Apache Hadoop HDFS-NFS ............................ SUCCESS [7.457s]
[INFO] Apache Hadoop HDFS Project ........................ SUCCESS [0.161s]
[INFO] hadoop-yarn ....................................... SUCCESS [0.140s]
[INFO] hadoop-yarn-api ................................... SUCCESS [3:22.369s]
[INFO] hadoop-yarn-common ................................ SUCCESS [53.995s]
[INFO] hadoop-yarn-server ................................ SUCCESS [0.176s]
[INFO] hadoop-yarn-server-common ......................... SUCCESS [13.378s]
[INFO] hadoop-yarn-server-nodemanager .................... SUCCESS [31.324s]
[INFO] hadoop-yarn-server-web-proxy ...................... SUCCESS [4.596s]
[INFO] hadoop-yarn-server-applicationhistoryservice ...... SUCCESS [7.033s]
[INFO] hadoop-yarn-server-resourcemanager ................ SUCCESS [24.992s]
[INFO] hadoop-yarn-server-tests .......................... SUCCESS [1.576s]
[INFO] hadoop-yarn-client ................................ SUCCESS [6.709s]
[INFO] hadoop-yarn-applications .......................... SUCCESS [0.213s]
[INFO] hadoop-yarn-applications-distributedshell ......... SUCCESS [3.840s]
[INFO] hadoop-yarn-applications-unmanaged-am-launcher .... SUCCESS [3.157s]
[INFO] hadoop-yarn-site .................................. SUCCESS [0.153s]
[INFO] hadoop-yarn-project ............................... SUCCESS [15.632s]
[INFO] hadoop-mapreduce-client ........................... SUCCESS [0.152s]
[INFO] hadoop-mapreduce-client-core ...................... SUCCESS [38.670s]
[INFO] hadoop-mapreduce-client-common .................... SUCCESS [33.585s]
[INFO] hadoop-mapreduce-client-shuffle ................... SUCCESS [6.307s]
[INFO] hadoop-mapreduce-client-app ....................... SUCCESS [15.549s]
[INFO] hadoop-mapreduce-client-hs ........................ SUCCESS [11.430s]
[INFO] hadoop-mapreduce-client-jobclient ................. SUCCESS [34.442s]
[INFO] hadoop-mapreduce-client-hs-plugins ................ SUCCESS [3.081s]
[INFO] Apache Hadoop MapReduce Examples .................. SUCCESS [8.559s]
[INFO] hadoop-mapreduce .................................. SUCCESS [11.834s]
[INFO] Apache Hadoop MapReduce Streaming ................. SUCCESS [1:07.545s]
[INFO] Apache Hadoop Distributed Copy .................... SUCCESS [1:19.210s]
[INFO] Apache Hadoop Archives ............................ SUCCESS [4.697s]
[INFO] Apache Hadoop Rumen ............................... SUCCESS [8.833s]
[INFO] Apache Hadoop Gridmix ............................. SUCCESS [7.416s]
[INFO] Apache Hadoop Data Join ........................... SUCCESS [4.417s]
[INFO] Apache Hadoop Extras .............................. SUCCESS [4.287s]
[INFO] Apache Hadoop Pipes ............................... SUCCESS [24.609s]
[INFO] Apache Hadoop OpenStack support ................... SUCCESS [8.762s]
[INFO] Apache Hadoop Client .............................. SUCCESS [19.307s]
[INFO] Apache Hadoop Mini-Cluster ........................ SUCCESS [0.386s]
[INFO] Apache Hadoop Scheduler Load Simulator ............ SUCCESS [11.350s]
[INFO] Apache Hadoop Tools Dist .......................... SUCCESS [14.692s]
[INFO] Apache Hadoop Tools ............................... SUCCESS [0.130s]
[INFO] Apache Hadoop Distribution ........................ SUCCESS [1:48.893s]
[INFO] ------------------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 30:50.549s
[INFO] Finished at: Tue Dec 09 07:31:56 PST 2014
[INFO] Final Memory: 81M/243M
[INFO] ------------------------------------------------------------------------
[root@master hadoop-2.5.2-src]#

install hadoop
tar hadoopxxxxx -C /opt/hadoop
create hadoop user

you must grant the follow folder authority to hadoop user
chown -R hadoop:hadoop /hadoop /opt/hadoop

switch to hadoop user       
config the following 7 files for hadoop cluster

create folder
be noted:
the three folders must be mapped to the config files

shutdown the master server. and clone it to slave1 and slave2.

next start three servers.

config the host name and network.

configurate the ssh login from master to two slaves.

next make sure the iptables are shutdown.

for my test env.
execute the below command by root user
chkconfig iptables off 
it will close iptales for ever.
chkconfig iptables on  (open)

./hdfs namenode -format

then test the installation
[hadoop@master sbin]$ ./start-dfs.sh
Starting namenodes on [master]
master: starting namenode, logging to /opt/hadoop/hadoop-2.5.2/logs/hadoop-hadoop-namenode-master.out
slave1: starting datanode, logging to /opt/hadoop/hadoop-2.5.2/logs/hadoop-hadoop-datanode-slave1.out
slave2: starting datanode, logging to /opt/hadoop/hadoop-2.5.2/logs/hadoop-hadoop-datanode-slave2.out
Starting secondary namenodes [master]
master: starting secondarynamenode, logging to /opt/hadoop/hadoop-2.5.2/logs/hadoop-hadoop-secondarynamenode-master.out
[hadoop@master sbin]$ jps
2440 SecondaryNameNode
2539 Jps
2274 NameNode
[hadoop@master sbin]$ ./start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /opt/hadoop/hadoop-2.5.2/logs/yarn-hadoop-resourcemanager-master.out
slave1: starting nodemanager, logging to /opt/hadoop/hadoop-2.5.2/logs/yarn-hadoop-nodemanager-slave1.out
slave2: starting nodemanager, logging to /opt/hadoop/hadoop-2.5.2/logs/yarn-hadoop-nodemanager-slave2.out
[hadoop@master sbin]$ jps
2440 SecondaryNameNode
2660 Jps
2274 NameNode
2584 ResourceManager
[hadoop@master sbin]$ pwd
[hadoop@master sbin]$ cd ..
[hadoop@master hadoop-2.5.2]$

[hadoop@slave1 hadoop-2.5.2]$ ls
bin  dfs  etc  include  lib  libexec  LICENSE.txt  NOTICE.txt  README.txt  sbin  share  tmp
[hadoop@slave1 hadoop-2.5.2]$ rm -rf tmp/
[hadoop@slave1 hadoop-2.5.2]$ rm -rf dfs/
[hadoop@slave1 hadoop-2.5.2]$ ls
bin  etc  include  lib  libexec  LICENSE.txt  NOTICE.txt  README.txt  sbin  share
[hadoop@slave1 hadoop-2.5.2]$ jps
2146 Jps
2079 DataNode
[hadoop@slave1 hadoop-2.5.2]$ jps
2213 Jps
2079 DataNode
2182 NodeManager
[hadoop@slave1 hadoop-2.5.2]$

[hadoop@slave2 hadoop-2.5.2]$ jps
2080 DataNode
2147 Jps
[hadoop@slave2 hadoop-2.5.2]$ jps
2270 Jps
2080 DataNode
2183 NodeManager
[hadoop@slave2 hadoop-2.5.2]$

check cluster nodes:


check the status of every node


[root@localhost ~]# groupadd hadoop
[root@localhost ~]# useradd -g hadoop hadoop
[root@localhost ~]# passwd hadoop                     

cat id_rsa.pub > authorized_keys
chmod go-rw ~/.ssh/authorized_keys
scp * hadoop@slave1:/opt/hadoop/xxxxx
[hadoop@slave1 ~]$ chmod 700 .ssh
[hadoop@slave2 ~]$ mkdir ~/.ssh
[hadoop@slave2 ~]$ chmod 700 .ssh


My QQ: 735028566




run jar file

[hadoop@hadoopmaster sbin]$ hadoop jar /opt/jack.jar org.apache.hadoop.t1.WordCount  /jackdemodir/wordcount/input /jackdemodir/wordcount/output1
15/08/01 22:44:35 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/
15/08/01 22:44:37 INFO input.FileInputFormat: Total input paths to process : 1
15/08/01 22:44:37 INFO mapreduce.JobSubmitter: number of splits:1
15/08/01 22:44:37 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1438494222950_0001
15/08/01 22:44:38 INFO impl.YarnClientImpl: Submitted application application_1438494222950_0001
15/08/01 22:44:38 INFO mapreduce.Job: The url to track the job: http://hadoopmaster:8088/proxy/application_1438494222950_0001/
15/08/01 22:44:38 INFO mapreduce.Job: Running job: job_1438494222950_0001
15/08/01 22:44:47 INFO mapreduce.Job: Job job_1438494222950_0001 running in uber mode : false
15/08/01 22:44:47 INFO mapreduce.Job:  map 0% reduce 0%
15/08/01 22:44:55 INFO mapreduce.Job:  map 100% reduce 0%
15/08/01 22:45:01 INFO mapreduce.Job:  map 100% reduce 100%
15/08/01 22:45:02 INFO mapreduce.Job: Job job_1438494222950_0001 completed successfully
15/08/01 22:45:02 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=571
                FILE: Number of bytes written=212507
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=463
                HDFS: Number of bytes written=385
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=5427
                Total time spent by all reduces in occupied slots (ms)=4297
                Total time spent by all map tasks (ms)=5427
                Total time spent by all reduce tasks (ms)=4297
                Total vcore-seconds taken by all map tasks=5427
                Total vcore-seconds taken by all reduce tasks=4297
                Total megabyte-seconds taken by all map tasks=5557248
                Total megabyte-seconds taken by all reduce tasks=4400128
        Map-Reduce Framework
                Map input records=1
                Map output records=55
                Map output bytes=556
                Map output materialized bytes=571
                Input split bytes=128
                Combine input records=55
                Combine output records=45
                Reduce input groups=45
                Reduce shuffle bytes=571
                Reduce input records=45
                Reduce output records=45
                Spilled Records=90
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=131
                CPU time spent (ms)=1910
                Physical memory (bytes) snapshot=462319616
                Virtual memory (bytes) snapshot=1765044224
                Total committed heap usage (bytes)=275251200
        Shuffle Errors
        File Input Format Counters
                Bytes Read=335
        File Output Format Counters
                Bytes Written=385



check the result

[hadoop@hadoopmaster sbin]$ hadoop fs -cat /jackdemodir/wordcount/output1/part-r-00000
350     1
ASF     1
Abdera  1
Apache? 1
Are     1
From    1
Open    2
Source  2
The     1
Zookeeper,      1
a       2
all-volunteer   1
and     3
are     3
by      1
chances 1
cover   1
develops,       1
experience      1
find    1
for     1
going   1
here.   1
if      1
in      1
incubates       1
industry        1
initiatives     1
it      1
leading 1
looking 1
more    1
of      1
powered 1
projects        1
range   1
rewarding       1
software,       1
stewards,       1
technologies.   1
than    1
that    1
to      2
wide    1
you     3

    标题 "eclipse开发hadoop2.5.2所用到的jar" 指的是在Eclipse中进行Hadoop 2.5.2开发时所需的特定版本的JAR文件集合。这些JAR文件通常包括以下几个部分: 1. Hadoop Common:这是Hadoop的基础模块,包含了一般用途的...

    Hadoop 2.5.2安装和部署

    本文将详细介绍如何从零开始安装和部署Hadoop 2.5.2版本。以下是详细步骤: 1. **先决条件** 在开始Hadoop的安装之前,确保你有一台或多台Linux服务器(例如Ubuntu、CentOS等),并具备一定的Linux基础操作技能。...



    hadoop 2.5.2 64位native包

    hadoop 2.5.2 64位native包


    ### hadoop2.5.2在Windows下的Eclipse环境搭建详解 #### 一、Hadoop简介 Hadoop是由Apache基金会所开发的一个开源分布式计算框架,主要用于处理和存储大规模数据集。它通过分布式文件系统(HDFS)和MapReduce编程...

    hadoop2.5.2 +eclipse +win32位环境安装全套资源及说明

    hadoop2.5.2 + eclipse + win32位环境安装及开发环境搭建全套资源及说明 Hi:发帖目的是因为Hadoop开发环境搭建太麻烦了,涉及很多版本和工具组件的问题,自己也走了不少弯路,把资源集中一下,让大家少走弯路,也...

    hadoop 2.5.2 源码

    Hadoop 2.5.2源码分析 Hadoop是一个开源框架,主要用于处理和存储大量数据,它由Apache软件基金会开发并维护。Hadoop 2.5.2是Hadoop发展过程中的一个重要版本,它引入了许多改进和优化,旨在提高系统的稳定性和性能...


    eclipse环境下集成hadoop2.5.2时候需要的jar包 ant已经编译好了的可以直接用


    Hadoop2.5.2集群安装知识点梳理: 1. Hadoop集群安装概述: Hadoop集群安装涉及多个步骤,包括准备环境、安装配置Hadoop以及测试集群的运行状态。本文档重点介绍的是基于MRV1架构的Hadoop集群安装,MRV1指的是...

    hadoop 2.5.2安装配置文档教程



    在这个场景中,我们关注的是Hadoop的2.5.2版本。这个版本在Hadoop的发展历程中扮演了重要角色,因为它带来了许多改进和优化,使得大数据处理更加高效和稳定。 Hadoop的核心由两个主要组件构成:Hadoop Distributed ...


    2、大数据环境-安装Hadoop2.5.2伪分布式傻瓜教程 原创


    网上百度的最高只有hadoop2.4的,于是自己想着也编译一版来,于是就基于hadoop2.5.2的源码在windows8 64位系统安装时自己编译的 包含 hadoop.dll hadoop.pdb hadoop.lib hadoop.exp winutils.exe winutils.pdb ...


    在搭建Hadoop 2.5.2环境的过程中,选择CentOS作为操作系统是一个常见的选择,因为其稳定性和与开源软件的良好兼容性。以下是基于CentOS 7.0搭建Hadoop 2.5.2的详细步骤,以及涉及的相关知识点: 1. **系统准备**: ...


    这个名为“hadoop-2.5.2.zip”的压缩包文件提供了预配置的Hadoop 2.5.2环境,特别适用于在Linux操作系统上快速搭建Hadoop集群或进行大数据分析。 Hadoop 2.5.2是Hadoop发展中的一个重要版本,它包含了许多关键的...





    Hadoop 2.5.2 windows

    Hadoop在win 7 x64的已编译文件,只包含windows下独有的文件。与官网hadoop.apache.org下的程序包合并使用,放于hadoop\bin文件夹下。 包含文件:hadoop.dll, hadoop.exp, hadoop.lib, hadoop.pdb, winutils.exe, ...

