`
huxiaojun_198213
  • 浏览: 104272 次
  • 性别: Icon_minigender_1
  • 来自: 深圳
社区版块
存档分类
最新评论

Hadoop中的配置说明

阅读更多
$HADOOP_INSTALL/hadoop/conf文件夹包含Hadoop的相关配置文件.它们是:

hadoop-env.sh - 此文件包含运行Hadoop的环境变量.可以使用这些配置来改变Hadoop后台线程的行为.比如:修改日志文件的存储位置,以及Hadoop可以使用的最大堆数量等.此文件中唯一需要修改的变量是JAVA_HOME,以用来指定JDK的安装目录.

slaves - 此文件用于配置运行Hadoop slave daemons(datanodes和tasktrackers)的主机,每行一个主机.缺省情况下,此文件只包含一个localhost条目.

hadoop-default.xml -此文件包含Hadoop后线程和Map/Reduce任务的缺省配置.切忌修改此文件.

mapred-default.xml -此文件包含Hadoop Map/Reduce后台线程和Jobs的站点特有配置.缺省情况下,文件内容是空的.配置此文件将会覆盖hadoop-default.xml中的Map/Reduce配置.
可以使用此文件来定制站点Map/Reduce行为.

hadoop-site.xml -此文件包含Hadoop Map/Reduce后台线程和Jobs的站点特有配置.缺省情况下,文件内容是空的.配置此文件可覆盖那些hadoop-default.xml和mapred-default.xml的行为.此文件必须包含可被Hadoop安装中的所有服务端和客户端所关心的配置,如,namenode和jobtracker的位置.

Basic Configuration

Take a pass at putting together basic configuration settings for your cluster. Some of the settings that follow are required, others are recommended for more straightforward and predictable operation.

Hadoop Environment Settings - Ensure that JAVA_HOME is set in hadoop-env.sh and points to the Java installation you intend to use. You can set other environment variables in hadoop-env.sh to suit your requirments. Some of the default settings refer to the variable HADOOP_HOME. The value of HADOOP_HOME is automatically inferred from the location of the startup scripts. HADOOP_HOME is the parent directory of the bin directory that holds the Hadoop scripts. In this instance it is $HADOOP_INSTALL/hadoop.
Jobtracker and Namenode settings - Figure out where to run your namenode and jobtracker. Set the variable fs.default.name to the Namenode's intended host:port. Set the variable mapred.job.tracker to the jobtrackers intended host:port. These settings should be in hadoop-site.xml. You may also want to set one or more of the following ports (also in hadoop-site.xml):
dfs.datanode.port
dfs.info.port
mapred.job.tracker.info.port
mapred.task.tracker.output.port
mapred.task.tracker.report.port
Data Path Settings - Figure out where your data goes. This includes settings for where the namenode stores the namespace checkpoint and the edits log, where the datanodes store filesystem blocks, storage locations for Map/Reduce intermediate output and temporary storage for the HDFS client. The default values for these paths point to various locations in /tmp. While this might be ok for a single node installation, for larger clusters storing data in /tmp is not an option. These settings must also be in hadoop-site.xml. It is important for these settings to be present in hadoop-site.xml because they can otherwise be overridden by client configuration settings in Map/Reduce jobs. Set the following variables to appropriate values:
dfs.name.dir
dfs.data.dir
dfs.client.buffer.dir
mapred.local.dir
An example of a hadoop-site.xml file:


<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/tmp/hadoop-${user.name}</value>
</property>
<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:54310</value>
</property>
<property>
  <name>mapred.job.tracker</name>
  <value>hdfs://localhost:54311</value>
</property>
<property>
  <name>dfs.replication</name>
  <value>8</value>
</property>
<property>
  <name>mapred.child.java.opts</name>
  <value>-Xmx512m</value>
</property>
</configuration>
Formatting the Namenode

The first step to starting up your Hadoop installation is formatting the Hadoop filesystem, which is implemented on top of the local filesystems of your cluster. You need to do this the first time you set up a Hadoop installation. Do not format a running Hadoop filesystem, this will cause all your data to be erased. Before formatting, ensure that the dfs.name.dir directory exists. If you just used the default, then mkdir -p /tmp/hadoop-username/dfs/name will create the directory. To format the filesystem (which simply initializes the directory specified by the dfs.name.dir variable), run the command:
% $HADOOP_INSTALL/hadoop/bin/hadoop namenode -format

If asked to [re]format, you must reply Y (not just y) if you want to reformat, else Hadoop will abort the format.

Starting a Single node cluster

Run the command:
% $HADOOP_INSTALL/hadoop/bin/start-all.sh
This will startup a Namenode, Datanode, Jobtracker and a Tasktracker on your machine.

Stopping a Single node cluster

Run the command
% $HADOOP_INSTALL/hadoop/bin/stop-all.sh
to stop all the daemons running on your machine.

Separating Configuration from Installation

In the example described above, the configuration files used by the Hadoop cluster all lie in the Hadoop installation. This can become cumbersome when upgrading to a new release since all custom config has to be re-created in the new installation. It is possible to separate the config from the install. To do so, select a directory to house Hadoop configuration (let's say /foo/bar/hadoop-config. Copy all conf files to this directory. You can either set the HADOOP_CONF_DIR environment variable to refer to this directory or pass it directly to the Hadoop scripts with the --config option. In this case, the cluster start and stop commands specified in the above two sub-sections become
% $HADOOP_INSTALL/hadoop/bin/start-all.sh --config /foo/bar/hadoop-config and
% $HADOOP_INSTALL/hadoop/bin/stop-all.sh --config /foo/bar/hadoop-config.
Only the absolute path to the config directory should be passed to the scripts.

Starting up a larger cluster

Ensure that the Hadoop package is accessible from the same path on all nodes that are to be included in the cluster. If you have separated configuration from the install then ensure that the config directory is also accessible the same way.
Populate the slaves file with the nodes to be included in the cluster. One node per line.
Follow the steps in the Basic Configuration section above.
Format the Namenode
Run the command % $HADOOP_INSTALL/hadoop/bin/start-dfs.sh on the node you want the Namenode to run on. This will bring up HDFS with the Namenode running on the machine you ran the command on and Datanodes on the machines listed in the slaves file mentioned above.
Run the command % $HADOOP_INSTALL/hadoop/bin/start-mapred.sh on the machine you plan to run the Jobtracker on. This will bring up the Map/Reduce cluster with Jobtracker running on the machine you ran the command on and Tasktrackers running on machines listed in the slaves file.
The above two commands can also be executed with a --config option.
Stopping the cluster

The cluster can be stopped by running % $HADOOP_INSTALL/hadoop/bin/stop-mapred.sh and then % $HADOOP_INSTALL/hadoop/bin/stop-dfs.sh on your Jobtracker and Namenode respectively. These commands also accept the --config option.
分享到:
评论

相关推荐

    apache hadoop1.0.3配置说明 doc

    本文档用于说明hadoop1.0.3安装配置的步骤 以及其中需要注意的事项

    hadoop 安装配置说明,以及相关实验等

    为了在大数据方向的学习和研究中深入理解Hadoop的工作原理与应用,必须掌握Hadoop的安装配置以及相关编程实验。本文从Linux基础操作讲起,逐步深入到Hadoop的安装配置,以及如何通过Hadoop进行HDFS编程实践、Hive、...

    hadoop安装配置说明-加词云.pdf

    HADOOP 安装配置实践手册 0 Linux 基础 1 Hadoop 安装配置 2 HDFS 编程 3 MYSQL 4 HIVE 5 Sqoop 6 Storm 7 Kafka 8 RDS 9 词云

    Hadoop2.2.0安装配置手册

    Hadoop2.2.0安装配置手册,新手安装和配置

    Hadoop安装配置说明.doc

    4. **Hadoop配置**: `hadoop-env.sh`, `hdfs-site.xml`, `core-site.xml`, `mapred-site.xml`等文件的修改是Hadoop运行的核心配置。 5. **Hadoop集群通信**: 配置主机名和IP映射,确保节点间能正确通信。 6. **...

    Hadoop集群配置及MapReduce开发手册

    - **内存优化**: 调整Hadoop配置文件中的内存参数,如`mapred.child.java.opts`、`yarn.nodemanager.resource.memory-mb`等,以提高系统的内存利用率。 - **master优化**: 对Namenode进行优化,比如增加缓存大小、...

    hadoop集群配置说明

    hadoop集群的安装、环境的搭建,集群配置手册

    hadoop集群配置及mapreduce开发手册

    - **内存优化**:根据实际硬件资源调整Hadoop配置文件中的内存参数。 - **Master优化**:减少Master节点的负担,如增加TaskTracker的数量。 - **文件存储设置**:优化数据块的存储位置和复制策略。 - **MapReduce...

    Hadoop 三个配置文件的参数含义说明

    在Hadoop生态系统中,配置文件对于集群的正常运行至关重要。主要涉及三个核心配置文件:`core-site.xml`、`hdfs-site.xml`和`mapred-site.xml`。这些文件定义了Hadoop系统的各种参数,包括但不限于端口设置、存储...

    Linux下Hadoop伪分布式配置及操作命令

    以上知识点详细地阐述了在CentOS系统上配置Hadoop伪分布式环境的全过程,包括了环境准备、JDK安装、环境变量配置、Hadoop配置文件修改、SSH无密码登录配置、集群的启动和使用,以及常用命令的介绍。对于初学者来说,...

    hadoop配置属性

    Hadoop配置属性是指在搭建Hadoop集群或运行Hadoop分布式处理任务时,需要进行的参数设置。Hadoop配置属性主要通过三个XML文件进行设置:core-site.xml、hdfs-site.xml和mapred-site.xml。每个文件都有对应的默认配置...

    hadoop配置.zip

    这个压缩包文件“hadoop配置”应该包含了上述配置的示例或模板,用于指导Hadoop HA的配置。务必根据自己的集群环境进行适当的修改和调整。在实际部署时,还需要考虑其他的集群管理工具,如Ambari,以简化管理和监控...

    hadoop配置文件编辑

    本文将对 Hadoop 的配置文件进行详细的解释,并对每个配置文件中的关键参数进行详细的说明。 一、core-site.xml core-site.xml 是 Hadoop 集群的核心配置文件,用于配置 Hadoop 的基本参数。这里面有七个关键参数...

    Hadoop安装与配置

    ### Hadoop安装与配置知识点详解 #### 一、Hadoop简介及核心组件 **Hadoop** 是Apache软件基金会旗下的一款开源分布式计算平台,其主要功能是处理和存储大规模数据集。Hadoop的核心组件包括 **Hadoop分布式文件...

    细细品味Hadoop_Hadoop集群(第5期)_Hadoop安装配置

    4. 配置Hadoop的配置文件,如hdfs-site.xml(HDFS相关配置)和mapred-site.xml(MapReduce相关配置)。 5. 初始化HDFS命名空间(格式化NameNode)。 6. 启动Hadoop服务,包括DataNode、NameNode、TaskTracker和...

    hadoop安装部署说明文档

    在`etc/hadoop/core-site.xml`中配置默认文件系统为HDFS,`etc/hadoop/hdfs-site.xml`中配置副本数量等参数。 5. **格式化NameNode**:首次安装时需执行`hdfs namenode -format`。 6. **启动Hadoop**:依次启动...

    hadoop3.2中配置及部分说明

    hadoop3.2中配置,包含core-site.xml,hdfs-site.xml,yarn-site.xml,并配有部分说明,仅供交流和参考

Global site tag (gtag.js) - Google Analytics