`
liyonghui160com
  • 浏览: 777697 次
  • 性别: Icon_minigender_1
  • 来自: 北京
社区版块
存档分类
最新评论

Hadoop2.2.0分布式集群配置

阅读更多

 

Hadoop 2.x和1.x已经大不相同了,应该说对于存储计算都更加通用了。Hadoop 2.x实现了用来管理集群资源的YARN框架,可以面向任何需要使用基于HDFS存储来计算的需要,当然MapReduce现在已经作为外围的插件式的计算框架,你可以根据需要开发或者选择合适的计算框架。目前,貌似对MapReduce支持还是比较好的,毕竟MapReduce框架已经还算成熟。其他一些基于YARN框架的标准也在开发中。
YARN框架的核心是资源的管理和分配调度,它比Hadoop 1.x中的资源分配的粒度更细了,也更加灵活了,它的前景应该不错。由于极大地灵活性,所以在使用过程中由于这些配置的灵活性,可能使用的难度也加大了一些。

 

操作系统为CentOS 6.4 64位,一台做主节点,另外三台做从节点,实践集群的安装配置。

主机配置规划

修改/etc/hosts文件,增加如下地址映射:

10.95.3.48     m1
10.95.3.54     s1
10.95.3.59     s2
10.95.3.66     s3

每台机器配置对应的hostname,修改/etc/sysconfig/network文件,例如s1节点内容配置为:

NETWORKING=yes
HOSTNAME=s1

m1为集群主节点,s1、s2、s3为集群从节点。
关于主机资源的配置,我们这里面使用VMWare工具,创建了4个虚拟机,具体置情况如下所示:

    一个主节点有1个核(core)
    一个主节点内存1G
    每个从节点有1个核(core)
    每个从节点内存2G

目录规划

Hadoop程序存放目录为/home/shirdrn/cloud/programs/hadoop-2.2.0,相关的数据目录,包括日志、存储等指定为/home/shirdrn/cloud/storage/hadoop-2.2.0。将程序和数据目录分开,可以更加方便的进行配置的同步。
具体目录的准备与配置如下所示:

    在每个节点上创建程序存储目录/home/shirdrn/cloud/programs/hadoop-2.2.0,用来存放Hadoop程序文件
    在每个节点上创建数据存储目录/home/shirdrn/cloud/storage/hadoop-2.2.0/hdfs,用来存放集群数据
    在主节点m1上创建目录/home/shirdrn/cloud/storage/hadoop-2.2.0/hdfs/name,用来存放文件系统元数据
    在每个从节点上创建目录/home/shirdrn/cloud/storage/hadoop-2.2.0/hdfs/data,用来存放真正的数据
    所有节点上的日志目录为/home/shirdrn/cloud/storage/hadoop-2.2.0/logs
    所有节点上的临时目录为/home/shirdrn/cloud/storage/hadoop-2.2.0/tmp

下面配置涉及到的目录,都参照这里的目录规划。

环境变量配置

首先,使用Sun的JDK,修改~/.bashrc文件,配置如下:

export JAVA_HOME=/usr/java/jdk1.6.0_45/
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=$JAVA_HOME/lib/*.jar:$JAVA_HOME/jre/lib/*.jar

然后配置Hadoop安装目录,相关环境变量:

# hadoopvariable settings
export HADOOP_HOME="$HOME/yarn/hadoop-2.2.0"
export HADOOP_PREFIX="$HADOOP_HOME/"
export YARN_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME="$HADOOP_HOME"
export HADOOP_COMMON_HOME="$HADOOP_HOME"
export HADOOP_HDFS_HOME="$HADOOP_HOME"
export HADOOP_CONF_DIR="$HADOOP_HOME/etc/hadoop/"
export YARN_CONF_DIR=$HADOOP_CONF_DIR
export HADOOP_LOG_DIR=/home/shirdrn/cloud/storage/hadoop-2.2.0/logs
export YARN_LOG_DIR=$HADOOP_LOG_DIR
export PATH="$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH"
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native


免密码登录配置

在每各节点上,执行如下命令:

ssh-keygen

然后点击回车一直下去即可。
在主节点m1上,执行命令:

ssh m1

保证不需要密码即可登录本机m1节点。
将m1的公钥,添加到s1、s2、s3的~/.ssh/authorized_keys文件中,并且需要查看~/.ssh/authorized_keys的权限,不能对同组用户具有写权限,如果有,则执行下面命令:

chmod g-w ~/.ssh/authorized_keys

这时,在m1节点上,应该保证执行如下命令不需要输入密码:

ssh s1
ssh s2
ssh s3

Hadoop配置文件

配置文件所在目录为/home/shirdrn/programs/hadoop-2.2.0/etc/hadoop,可以修改对应的配置文件。

    配置文件core-site.xml内容

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://m1:9000/</value>
                <description>The name of the default file system. A URI whose scheme
                        and authority determine the FileSystem implementation. The uri's
                        scheme determines the config property (fs.SCHEME.impl) naming the
                        FileSystem implementation class. The uri's authority is used to
                        determine the host, port, etc. for a filesystem.</description>
        </property>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/home/shirdrn/cloud/storage/hadoop-2.2.0/tmp/hadoop-${user.name}</value>
                <description>A base for other temporary directories.</description>
        </property>
</configuration>

    配置文件hdfs-site.xml内容

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
        <property>
                <name>dfs.namenode.name.dir</name>
                <value>/home/shirdrn/cloud/storage/hadoop-2.2.0/hdfs/name</value>
                <description>Path on the local filesystem where the NameNode stores
                        the namespace and transactions logs persistently.</description>
        </property>
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>/home/shirdrn/cloud/storage/hadoop-2.2.0/hdfs/data</value>
                <description>Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks.</description>
        </property>
        <property>
                <name>dfs.permissions</name>
                <value>false</value>
        </property>

      <property>
       <name>dfs.datanode.max.xcievers</name>
        <value>4096</value>

        <description>一个 Hadoop HDFS Datanode 有一个同时处理文件的上限. 这个参数叫 xcievers (Hadoop的作者把这个单词拼错了). 在你加载之前,先确认下你有没有配置这个文件conf/hdfs-site.xml里面的xceivers参数,至少要有4096</description>
      </property>

        <property>
                <name>dfs.replication</name>
                <value>3</value>
        </property>

        <property>
                <name>fs.hdfs.impl.disable.cache</name>
                <value>true</value>
        </property>
</configuration>

 

conf.set("fs.hdfs.impl.disable.cache",true) 允许多个hdfs连接


    配置文件yarn-site.xml内容

<?xml version="1.0"?>
<configuration>
        <property>
                <name>yarn.resourcemanager.resource-tracker.address</name>
                <value>m1:8031</value>
                <description>host is the hostname of the resource manager and
                        port is the port on which the NodeManagers contact the Resource Manager.
                </description>
        </property>
        <property>
                <name>yarn.resourcemanager.scheduler.address</name>
                <value>m1:8030</value>
                <description>host is the hostname of the resourcemanager and port is
                        the port
                        on which the Applications in the cluster talk to the Resource Manager.
                </description>
        </property>
        <property>
                <name>yarn.resourcemanager.scheduler.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
                <description>In case you do not want to use the default scheduler</description>
        </property>
        <property>
                <name>yarn.resourcemanager.address</name>
                <value>m1:8032</value>
                <description>the host is the hostname of the ResourceManager and the
                        port is the port on
                        which the clients can talk to the Resource Manager.
                </description>
        </property>
        <property>
                <name>yarn.nodemanager.local-dirs</name>
                <value>${hadoop.tmp.dir}/nodemanager/local</value>
                <description>the local directories used by the nodemanager</description>
        </property>
        <property>
                <name>yarn.nodemanager.address</name>
                <value>0.0.0.0:8034</value>
                <description>the nodemanagers bind to this port</description>
        </property>
        <property>
                <name>yarn.nodemanager.resource.cpu-vcores</name>
                <value>1</value>
                <description></description>
        </property>
        <property>
                <name>yarn.nodemanager.resource.memory-mb</name>
                <value>2048</value>
                <description>Defines total available resources on the NodeManager to be made available to running containers</description>
        </property>
        <property>
                <name>yarn.nodemanager.remote-app-log-dir</name>
                <value>${hadoop.tmp.dir}/nodemanager/remote</value>
                <description>directory on hdfs where the application logs are moved to </description>
        </property>
        <property>
                <name>yarn.nodemanager.log-dirs</name>
                <value>${hadoop.tmp.dir}/nodemanager/logs</value>
                <description>the directories used by Nodemanagers as log directories</description>
        </property>
        <property>
                <name>yarn.application.classpath</name>
                <value>$HADOOP_HOME,$HADOOP_HOME/share/hadoop/common/*,
               $HADOOP_HOME/share/hadoop/common/lib/*,
               $HADOOP_HOME/share/hadoop/hdfs/*,$HADOOP_HOME/share/hadoop/hdfs/lib/*,
               $HADOOP_HOME/share/hadoop/yarn/*,$HADOOP_HOME/share/hadoop/yarn/lib/*,
               $HADOOP_HOME/share/hadoop/mapreduce/*,$HADOOP_HOME/share/hadoop/mapreduce/lib/*</value>
                <description>Classpath for typical applications.</description>
        </property>
        <!-- Use mapreduce_shuffle instead of mapreduce.suffle (YARN-1229)-->
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
                <description>shuffle service that needs to be set for Map Reduce to run </description>
        </property>
     <property>
            <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
            <value>org.apache.hadoop.mapred.ShuffleHandler</value>
     </property>
     <property>
            <name>yarn.scheduler.minimum-allocation-mb</name>
            <value>256</value>
     </property>
     <property>
            <name>yarn.scheduler.maximum-allocation-mb</name>
            <value>6144</value>
     </property>
     <property>
            <name>yarn.scheduler.minimum-allocation-vcores</name>
            <value>1</value>
     </property>
     <property>
            <name>yarn.scheduler.maximum-allocation-vcores</name>
            <value>3</value>
     </property>
</configuration>

    配置mapred-site.xml文件

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
     <property>
          <name>mapreduce.framework.name</name>
          <value>yarn</value>
          <description>Execution framework set to Hadoop YARN.</description>
     </property>
     <property>
          <name>mapreduce.map.memory.mb</name>
          <value>512</value>
          <description>Larger resource limit for maps. default 1024M</description>
     </property>
     <property>
          <name>mapreduce.map.cpu.vcores</name>
          <value>1</value>
          <description></description>
     </property>
     <property>
          <name>mapreduce.reduce.memory.mb</name>
          <value>512</value>
          <description>Larger resource limit for reduces.</description>
     </property>
     <property>
          <name>mapreduce.reduce.shuffle.parallelcopies</name>
          <value>5</value>
          <description>Higher number of parallel copies run by reduces to fetch outputs from very large number of maps.</description>
     </property>
     <property>
          <name>mapreduce.jobhistory.address</name>
          <value>m1:10020</value>
          <description>MapReduce JobHistory Server host:port, default port is 10020.</description>
     </property>
     <property>
          <name>mapreduce.jobhistory.webapp.address</name>
          <value>m1:19888</value>
          <description>MapReduce JobHistory Server Web UI host:port, default port is 19888.</description>
     </property>
</configuration>

    配置hadoop-env.sh、yarn-env.sh、mapred-env.sh脚本文件

修改每个脚本文件的JAVA_HOME变量即可,如下所示:

export JAVA_HOME=/usr/java/jdk1.6.0_45/

 

配置conf/masters和conf/slaves来设置主从结点,注意最好使用主机名,并且保证机器之间通过主机名可以互相访问,每个主机名一行。

 

vi masters:
输入:

m1

 

vi slaves:

输入:
s2
s3

s1


同步分发程序文件
在主节点m1上将上面配置好的程序文件,复制分发到各个从节点上:

scp -r /home/shirdrn/cloud/programs/hadoop-2.2.0 shirdrn@s1:/home/shirdrn/cloud/programs/
scp -r /home/shirdrn/cloud/programs/hadoop-2.2.0 shirdrn@s2:/home/shirdrn/cloud/programs/
scp -r /home/shirdrn/cloud/programs/hadoop-2.2.0 shirdrn@s3:/home/shirdrn/cloud/programs/

启动HDFS集群

经过上面配置以后,可以启动HDFS集群。
为了保证集群启动过程中不会出现问题,需要手动关闭每个节点上的防火墙,执行如下命令:

sudo service iptables stop

或者永久关闭防火墙:

sudo chkconfig iptables off
sudo chkconfig ip6tables off

在主节点m1上,首先进行文件系统格式化操作,执行如下命令:

hadoop namenode -format

然后,可以启动HDFS集群,执行如下命令:

start-dfs.sh

可以查看启动日志,确认HDFS集群启动是否成功:

tail -100f /home/shirdrn/cloud/storage/hadoop-2.2.0/logs/hadoop-shirdrn-namenode-m1.log
tail -100f /home/shirdrn/cloud/storage/hadoop-2.2.0/logs/hadoop-shirdrn-secondarynamenode-m1.log
tail -100f /home/shirdrn/cloud/storage/hadoop-2.2.0/logs/hadoop-shirdrn-datanode-s1.log
tail -100f /home/shirdrn/cloud/storage/hadoop-2.2.0/logs/hadoop-shirdrn-datanode-s2.log
tail -100f /home/shirdrn/cloud/storage/hadoop-2.2.0/logs/hadoop-shirdrn-datanode-s3.log

或者,查看对应的进程情况:

jps

可以通过登录Web控制台,查看HDFS集群状态,访问如下地址:


http://m1:50070/

启动YARN集群

在主节点m1上,执行如下命令:

start-yarn.sh

可以查看启动日志,确认YARN集群启动是否成功:

tail -100f /home/shirdrn/cloud/storage/hadoop-2.2.0/logs/yarn-shirdrn-resourcemanager-m1.log
tail -100f /home/shirdrn/cloud/storage/hadoop-2.2.0/logs/yarn-shirdrn-nodemanager-s1.log
tail -100f /home/shirdrn/cloud/storage/hadoop-2.2.0/logs/yarn-shirdrn-nodemanager-s2.log
tail -100f /home/shirdrn/cloud/storage/hadoop-2.2.0/logs/yarn-shirdrn-nodemanager-s3.log

或者,查看对应的进程情况:

jps

另外,ResourceManager运行在主节点m1上,可以Web控制台查看状态:


http://m1:8088/

NodeManager运行在从节点上,可以通过Web控制台查看对应节点的资源状态,例如节点s1:


http://s1:8042/

管理JobHistory Server

启动可以JobHistory Server,能够通过Web控制台查看集群计算的任务的信息,执行如下命令:

mr-jobhistory-daemon.sh start historyserver

默认使用19888端口。
通过访问http://m1:19888/查看任务执行历史信息。
终止JobHistory Server,执行如下命令:

mr-jobhistory-daemon.sh stop historyserver

集群验证

我们使用Hadoop自带的WordCount例子进行验证。
先在HDFS创建几个数据目录:

hadoop fs -mkdir -p /data/wordcount
hadoop fs -mkdir -p /output/

目录/data/wordcount用来存放Hadoop自带的WordCount例子的数据文件,运行这个MapReduce任务的结果输出到/output/wordcount目录中。
将本地文件上传到HDFS中:

hadoop fs -put /home/shirdrn/cloud/programs/hadoop-2.2.0/etc/hadoop/*.xml /data/wordcount/

可以查看上传后的文件情况,执行如下命令:

hadoop fs -ls /data/wordcount

可以看到上传到HDFS中的文件。
下面,运行WordCount例子,执行如下命令:

hadoop jar /home/shirdrn/cloud/programs/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /data/wordcount /output/wordcount

可以看到控制台输出程序运行的信息:

[shirdrn@m1 hadoop-2.2.0]$ hadoop jar /home/shirdrn/cloud/programs/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /data/wordcount /output/wordcount
13/12/25 22:38:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/12/25 22:38:03 INFO client.RMProxy: Connecting to ResourceManager at m1/10.95.3.48:8032
13/12/25 22:38:04 INFO input.FileInputFormat: Total input paths to process : 7
13/12/25 22:38:04 INFO mapreduce.JobSubmitter: number of splits:7
13/12/25 22:38:04 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
13/12/25 22:38:04 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
13/12/25 22:38:04 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
13/12/25 22:38:04 INFO Configuration.deprecation: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class
13/12/25 22:38:04 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
13/12/25 22:38:04 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
13/12/25 22:38:04 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
13/12/25 22:38:04 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
13/12/25 22:38:04 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
13/12/25 22:38:04 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
13/12/25 22:38:04 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
13/12/25 22:38:04 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
13/12/25 22:38:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1388039619930_0002
13/12/25 22:38:05 INFO impl.YarnClientImpl: Submitted application application_1388039619930_0002 to ResourceManager at m1/10.95.3.48:8032
13/12/25 22:38:05 INFO mapreduce.Job: The url to track the job: http://m1:8088/proxy/application_1388039619930_0002/
13/12/25 22:38:05 INFO mapreduce.Job: Running job: job_1388039619930_0002
13/12/25 22:38:14 INFO mapreduce.Job: Job job_1388039619930_0002 running in uber mode : false
13/12/25 22:38:14 INFO mapreduce.Job:  map 0% reduce 0%
13/12/25 22:38:22 INFO mapreduce.Job:  map 14% reduce 0%
13/12/25 22:38:42 INFO mapreduce.Job:  map 29% reduce 5%
13/12/25 22:38:43 INFO mapreduce.Job:  map 43% reduce 5%
13/12/25 22:38:45 INFO mapreduce.Job:  map 43% reduce 14%
13/12/25 22:38:54 INFO mapreduce.Job:  map 57% reduce 14%
13/12/25 22:38:55 INFO mapreduce.Job:  map 71% reduce 19%
13/12/25 22:38:56 INFO mapreduce.Job:  map 100% reduce 19%
13/12/25 22:38:57 INFO mapreduce.Job:  map 100% reduce 100%
13/12/25 22:38:58 INFO mapreduce.Job: Job job_1388039619930_0002 completed successfully
13/12/25 22:38:58 INFO mapreduce.Job: Counters: 44
     File System Counters
          FILE: Number of bytes read=15339
          FILE: Number of bytes written=667303
          FILE: Number of read operations=0
          FILE: Number of large read operations=0
          FILE: Number of write operations=0
          HDFS: Number of bytes read=21904
          HDFS: Number of bytes written=9717
          HDFS: Number of read operations=24
          HDFS: Number of large read operations=0
          HDFS: Number of write operations=2
     Job Counters
          Killed map tasks=2
          Launched map tasks=9
          Launched reduce tasks=1
          Data-local map tasks=9
          Total time spent by all maps in occupied slots (ms)=457338
          Total time spent by all reduces in occupied slots (ms)=65832
     Map-Reduce Framework
          Map input records=532
          Map output records=1923
          Map output bytes=26222
          Map output materialized bytes=15375
          Input split bytes=773
          Combine input records=1923
          Combine output records=770
          Reduce input groups=511
          Reduce shuffle bytes=15375
          Reduce input records=770
          Reduce output records=511
          Spilled Records=1540
          Shuffled Maps =7
          Failed Shuffles=0
          Merged Map outputs=7
          GC time elapsed (ms)=3951
          CPU time spent (ms)=22610
          Physical memory (bytes) snapshot=1598832640
          Virtual memory (bytes) snapshot=6564274176
          Total committed heap usage (bytes)=971993088
     Shuffle Errors
          BAD_ID=0
          CONNECTION=0
          IO_ERROR=0
          WRONG_LENGTH=0
          WRONG_MAP=0
          WRONG_REDUCE=0
     File Input Format Counters
          Bytes Read=21131
     File Output Format Counters
          Bytes Written=9717

查看结果,执行如下命令:

hadoop fs -cat /output/wordcount/part-r-00000 | head

 

 

 

 

分享到:
评论

相关推荐

    Hadoop2.2.0安装配置手册!完全分布式Hadoop集群搭建过程

    Hadoop2.2.0安装配置手册!完全分布式Hadoop集群搭建过程 按照文档中的操作步骤,一步步操作就可以完全实现hadoop2.2.0版本的完全分布式集群搭建过程

    如何在Windows下的eclipse调试Hadoop2.2.0分布式集群

    在Windows环境下,使用Eclipse进行Hadoop 2.2.0分布式集群的调试是一项重要的技能,这可以帮助开发者更好地理解和优化Hadoop程序。以下是一些关键的知识点,将指导你完成这个过程。 首先,Hadoop是一个开源的分布式...

    CentOS6.5x64下安装19实体节点Hadoop2.2.0集群配置指南

    资源名称:CentOS 6.5 x64下安装19实体节点Hadoop 2.2.0集群配置指南内容简介: CentOS 6.5 x64下安装19实体节点Hadoop 2.2.0集群配置指南主要讲述的是CentOS 6.5 x64下安装19实体节点Hadoop 2.2.0集群配置指南;...

    Hadoop2完全分布式集群搭建

    搭建Hadoop2.2.0分布式集群需要进行以下步骤: * 安装Maven 3.0或更高版本 * 编译Hadoop-2.2.0源码 * 配置节点文件slaves * 配置Hadoop的core-site.xml文件 * 配置Hadoop的hdfs-site.xml文件 5. Hadoop集群配置 ...

    Hadoop2.2.0集群安装

    Hadoop2.2.0完全分布式集群平台安装设置 HDFS HA架构: 1、先设定电脑的IP为静态地址: 2、设置各个主机的hostname 3、在所有电脑的/etc/hosts添加以下配置: 4、设置SSH无密码登陆 5、下载解压hadoop-2.2.0.tar.gz...

    Hadoop 2.2.0 配置文件

    Hadoop 2.2.0 是 Apache Hadoop 的一个关键版本,它包含了众多改进和优化,使得这个分布式计算框架在处理大数据集时更加高效和稳定。在这个配置文件中,我们将会探讨Hadoop 2.2.0 在4台CentOS 6.4系统上运行所需的...

    Hadoop2.2.0安装配置手册!完全分布式Hadoop集群搭建过程.

    Hadoop2.2.0安装配置手册!完全分布式Hadoop集群搭建过程.

    hadoop2.2.0的64位安装包

    对于Hadoop集群部署,还需要考虑Hadoop的分布式文件系统(HDFS)和资源管理器(YARN)的配置。例如,你需要设置`dfs.replication`来控制数据副本的数量,`mapreduce.framework.name`来选择计算模型(本地模式、伪...

    Hadoop2.2.0安装配置及实例教程入门宝典

    《Hadoop2.2.0安装配置及实例教程入门宝典》 在大数据处理领域,Hadoop是一个不可或缺的工具,其分布式存储和计算能力为企业级数据处理提供了强大的支持。本教程将带你逐步走进Hadoop的世界,从零开始,教你如何在...

    hadoop伪分布式搭建(超级详细)

    我们将基于提供的hadoop-2.2.0版本进行操作,并参考名为“hadoop2.2.0伪分布式搭建.txt”的指南文件。 首先,我们需要安装Java开发工具包(JDK),因为Hadoop是用Java编写的,依赖于JDK运行。确保你的系统已经安装...

    hadoop2.2.0安装指南.docx

    总结来说,安装Hadoop 2.2.0的关键步骤包括:下载软件包、配置集群环境、设置SSH无密码登录、安装JDK并配置环境变量、调整Hadoop配置文件,以及启动服务。遵循这些步骤,可以顺利地在多台机器上构建一个基本的Hadoop...

    hadoop2.2.0部署

    ### Hadoop 2.2.0 部署详尽指南 #### 一、安装Linux **1.... - **待补充:** 这部分需要更详细的说明来指导...通过以上步骤,可以完成Hadoop 2.2.0及其相关组件的部署和配置,为大数据处理任务提供一个稳定可靠的平台。

    hadoop2.2.0windows10版

    Hadoop是Apache软件基金会开发的一个开源分布式计算框架,它允许在大规模集群中高效处理和存储大量数据。在本场景中,我们关注的是Hadoop 2.2.0版本,特别为Windows 10操作系统进行了优化和配置。这个版本在当时是一...

    hadoop-2.2.0依赖的jar包

    总结来说,Hadoop 2.2.0依赖的jar包构成了其分布式计算框架的基石,涵盖了从数据存储、计算到资源管理的各个方面。了解并掌握这些依赖关系对于开发、部署和优化Hadoop应用至关重要。通过使用hadoop-eclipse-plugin-...

    hadoop2.2.0-lib-native-macos.zip

    Hadoop是Apache软件基金会开发的一个开源分布式计算框架,它允许在大规模集群中高效处理和存储大量数据。在Hadoop 2.2.0版本中,引入了一套专门为64位操作系统设计的本地库(native libraries),这个zip文件...

    Hadoop2.2.0 HDFS开发依赖的jar包

    Hadoop2.2.0版本是一个重要的里程碑,引入了诸多改进和优化,增强了系统的稳定性和性能。为了在Eclipse这样的开发环境中进行HDFS相关的开发工作,你需要正确的jar包支持。下面我们将详细讨论Hadoop2.2.0 HDFS开发所...

    hadoop2.2.0API

    Hadoop 2.2.0 API 是一个关键的开源框架,用于大数据处理和分布式存储。这个版本在Hadoop的历史上扮演了重要的角色,因为它引入了许多改进和新特性,旨在提高性能、可扩展性和稳定性。Hadoop是Apache软件基金会的一...

Global site tag (gtag.js) - Google Analytics