  • 浏览: 4813520 次
  • 性别: Icon_minigender_1
  • 来自: 武汉




[node1 conf]$ cat core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->


<description>URI of NameNode.</description>
--fs.default.name 缺省的文件URI标识设定。
<description>Temp dir.</description>
--hadoop.tmp.dir 临时目录设定
<description>List of excluded DataNodes.</description>
-- dfs.hosts.exclude Datanode的黑名单
--fs.default.name0 Avatar hadoop的配置,主namenode的URI,目前设置为空
--fs.default.name1 备namenode的URI


[node1 conf]$ cat hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->


<!-- The necessary parameters, please modify value
<description>File fsimage location.
If this is a comma-delimited list of directories then the name
table is replicated in all of the directories, for redundancy.
<description>File edits location.
If this is a comma-delimited list of directories then the name
table is replicated in all of the directories, for redundancy.
<description>Determines where on the local filesystem an DFS data node
should store its blocks. If this is a comma-delimited
list of directories, then data will be stored in all named
directories, typically on different devices.
Directories that do not exist are ignored.

[hadoop@node1 ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
25G 3.3G 20G 15% /
tmpfs 32G 88K 32G 1% /dev/shm
/dev/cciss/c0d0p1 485M 38M 422M 9% /boot
917G 1023M 870G 1% /data1
917G 3.1G 868G 1% /data2
80G 13G 63G 17% /home
/dev/mapper/mpathe 474G 381G 69G 85% /data02
/dev/mapper/mpathf 474G 379G 71G 85% /data03
tmpfs 16G 3.4G 13G 22% /dev/flare
/dev/mapper/mpathd 474G 378G 72G 85% /data01
fuse_dfs 12T 8.9T 2.8T 77% /home/ocdc/fuse-dfs
/dev/mapper/mpathg 474G 373G 77G 83% /data04
/dev/mapper/mpathh 474G 373G 77G 83% /data05

The address where the datanode server will listen to.
If the port is 0 then the server will start on a free port.
[hadoop@node1 ~]$ netstat -an|grep 50010|grep LISTEN
tcp 0 0 ::ffff: :::* LISTEN

The datanode http server address and port.
If the port is 0 then the server will start on a free port.


[hadoop@node1 ~]$ netstat -an|grep 50075
tcp 0 0 ::ffff: :::* LISTEN

The datanode ipc server address and port.
If the port is 0 then the server will start on a free port.

<description>The number of server threads for the datanode.</description>

--datanode 线程数

<description>The number of server threads for the namenode.</description>
<description>The user account used by the web interface.
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
<description>Minimal block replication.
<description>set if hadoop support append</description>
<description>set if hadoop support append</description>
<description>Determines where on the local filesystem the DFS secondary
name node should store the temporary images to merge.
If this is a comma-delimited list of directories then the image is
replicated in all of the directories for redundancy.
--第二(备份)DFS namenode目录
Specifies the maximum bandwidth that each datanode can utilize for the balancing purpose in term of the number of bytes per second.
--Hadoop HDFS Datanode同时处理文件的上限,有点类似于linux的nfile






[node1 conf]$ cat mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->


<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
----JobTracker 的主机(或者IP)和端口。
The job tracker http server address and port the server will listen on.
If the port is 0 then the server will start on a free port.
---job tracker http server的端口和地址
The number of server threads for the JobTracker. This should be roughly
4% of the number of tasktracker nodes.
<description>The maximum number of reduce tasks that will be run
simultaneously by a task tracker.
-- 最大reduce并发数

<description>The interface and port that task tracker server listens on.
Since it is only connected to by the tasks, it uses the local interface.
EXPERT ONLY. Should only be changed if your host does not have the loopback
<description>The local directory where MapReduce stores intermediate
data files. May be a comma-separated list of
directories on different devices in order to spread disk i/o.
Directories that do not exist are ignored.
<description>The shared directory where MapReduce stores control files.
--Map/Reduce存放控制文件的目录, Map/Reduce框架存储系统文件的HDFS路径。
<description>A shared directory for temporary files.
<description>The default number of map tasks per job.
Ignored when mapred.job.tracker is "local".
<description>The default number of reduce tasks per job. Typically set to 99%
of the cluster's reduce capacity, so that if a node fails the reduces can
still be executed in a single wave.
Ignored when mapred.job.tracker is "local".
<description> User can specify a location to store the history files of
a particular job. If nothing is specified, the logs are stored in
output directory. The files are stored in "_logs/history/" in the directory.
User can stop logging by giving the value "none".
<description>How many tasks to run per jvm. If set to -1, there is
no limit.
<description>The number of milliseconds before a task will be
terminated if it neither reads an input, writes an output, nor
updates its status string.


<!-- FAIR Scheduler

---设定任务的执行计划实现类,默认值是org.apache.hadoop.mapred.JobQueueTaskScheduler,非默认说明使用Fair Scheduler算法代替FIFO

--- FAIR Scheduler的相关配置

<!-- FAST Scheduler
---默认调度算法FIFO,计算能力调度算法Capacity Scheduler(Yahoo!开发),公平份额调度算法Fair Scheduler(Facebook开发)
<!-- Average Schedule




    Hadoop 三个配置文件的参数含义说明


    大数据运维技术第4章 Hadoop文件参数配置课件.pptx

    以下是关于Hadoop配置文件、环境配置、守护进程环境配置以及配置参数格式的详细说明。 1. **Hadoop配置文件** - **core-site.xml**:这是集群全局参数的配置文件,用于设定系统级别的参数,比如默认的文件系统(fs...


    在`etc/hadoop/core-site.xml`中配置默认文件系统为HDFS,`etc/hadoop/hdfs-site.xml`中配置副本数量等参数。 5. **格式化NameNode**:首次安装时需执行`hdfs namenode -format`。 6. **启动Hadoop**:依次启动...


    以下是对标题和描述中涉及的参数及原理的详细说明: 1. **MapTask运行内部原理** - **MapOutputBuffer**:每个MapTask都有一个内存缓冲区,用于暂存计算结果。默认大小为100MB,可使用`io.sort.mb`参数调整。 - *...


    3. 如果`happod.jar`内部有对JDK路径的硬编码,可能需要使用像`jar uf`命令更新jar文件中的配置,或者在运行时通过`-Djava.home`参数指定JDK路径。 在Windows上部署Hadoop需要对操作系统、网络和Java编程有一定的...


    5. **配置与设置**:Hadoop API文档也会涵盖如何配置Hadoop集群,包括修改`core-site.xml`、`hdfs-site.xml`和`mapred-site.xml`配置文件,以调整各种参数,如副本数、内存分配和网络拓扑。 6. **错误处理与容错...


    Hadoop 配置文件参数详解 Hadoop 配置文件是 Hadoop 集群的核心组件之一,它们控制着 Hadoop 集群的行为和性能。Hadoop 配置文件主要包括 core-site.xml、hdfs-site.xml 和 mapred-site.xml 三个配置文件。这些配置...


    4. **配置Hadoop配置文件**:主要包括core-site.xml、hdfs-site.xml、mapred-site.xml和yarn-site.xml等配置文件,这些文件用于设置Hadoop集群的基本参数。 #### 三、熟悉Hadoop Shell命令 Hadoop提供了丰富的...

    Hadoop源码分析 完整版 共55章

    - `conf`:用于读取系统的配置参数,依赖于`fs`包中的文件系统功能。 #### 三、Hadoop源码详解 - **序列化机制**:为了满足Hadoop MapReduce和HDFS的通信需求,Hadoop采用了自定义的序列化机制而不是Java自带的...


    3. **使用Spark的Hadoop兼容模式**:Spark可以以Hadoop客户端模式运行,通过`spark.hadoop`前缀配置Hadoop参数。 **在IDE中开发和运行Spark应用:** 1. **选择IDE**:可以选择IntelliJ IDEA、Eclipse等支持Scala或...



    大数据基础操作说明-HADOOP HIVE IMPALA

    大数据基础操作说明-HADOOP HIVE IMPALA Hadoop 是一个基于分布式存储和计算的开源框架,Hive 是基于 Hadoop 的一个数据仓库工具,Impala 是一个高性能的分布式SQL查询引擎。在这篇文章中,我们将会了解 Hadoop ...


    在安装Hadoop时,根据服务器的架构选择32位或64位版本,并进行相应的配置优化,比如调整内存分配、网络通信参数等,以提高Hadoop集群的性能。 一旦Hadoop集群运行稳定,就可以引入Hive进行数据分析平台的构建。Hive...

    hadoop 集群搭建说明书.rar

    在大数据处理领域,Apache Hadoop 是一个至关重要的开源框架,它允许分布式存储和处理大量数据。本教程将详细讲解如何在虚拟机上搭建一...请参照提供的"Hadoop 集群搭建说明书.docx"文件,获取更详细的操作指南和技巧。


    8. **HadooponWindows-master**: 解压此文件后,根据其中的说明进行操作,可能包含简化安装过程的批处理脚本或配置文件。 **注意事项**: 1. 确保所有依赖的软件(如Java)和Hadoop本身都是兼容的,尤其是在不同...


    - **知识点说明**:在Hadoop生态系统中,HDFS (Hadoop Distributed File System) 是分布式文件系统的核心组件之一,主要用于存储大量数据。HDFS中的数据存储工作由DataNode来负责。因此,正确答案应指向DataNode。 ...


    8. **验证安装**: 执行`hadoop fs -ls /`命令,如果返回HDFS的根目录信息,说明Hadoop已经成功安装并运行。 以上就是Windows环境下安装Hadoop依赖库的基本过程。安装完成后,可以进一步探索Hadoop的数据处理能力,...


    在这些文件中,你需要指定如HDFS的名称节点、数据节点,Zookeeper的服务器列表,HBase的主节点地址等参数。 在配置完成后,启动各个服务,包括Hadoop的NameNode、DataNode、SecondaryNameNode、ResourceManager、...

Global site tag (gtag.js) - Google Analytics