`

第二章:小朱笔记hadoop之源码分析-脚本分析

 
阅读更多

第二章:小朱笔记hadoop之源码分析-脚本分析

第一节:start-all.sh
第二节:hadoop-config.sh
第三节:hadoop-env.sh
第四节:start-dfs.sh
第五节:hadoop-daemon.sh
第六节:hadoop-daemons.sh
第七节:slaves.sh
第八节:start-mapred.sh
第九节:hadoop

 

第一节:start-all.sh

       (1)内容:

# Start all hadoop daemons.  Run this on master node.

bin=`dirname "$0"`
bin=`cd "$bin"; pwd`

if [ -e "$bin/../libexec/hadoop-config.sh" ]; then
  . "$bin"/../libexec/hadoop-config.sh
else
  . "$bin/hadoop-config.sh"
fi

# start dfs daemons
"$bin"/start-dfs.sh --config $HADOOP_CONF_DIR

# start mapred daemons
"$bin"/start-mapred.sh --config $HADOOP_CONF_DIR

 (2)分析
  根据运行此脚本的目录进入安装hadoop目录下的bin目录,获取hadoop-config.sh 配置,对一些变量进行赋值,然后运行启动hdfs和mapred的启动脚本。

 

写道
Start all hadoop daemons. Run this on master node.

 

 

第二节:hadoop-config.sh

(1)内容

# Allow alternate conf dir location.
if [ -e "${HADOOP_PREFIX}/conf/hadoop-env.sh" ]; then
  DEFAULT_CONF_DIR="conf"
else
  DEFAULT_CONF_DIR="etc/hadoop"
fi
HADOOP_CONF_DIR="${HADOOP_CONF_DIR:-$HADOOP_PREFIX/$DEFAULT_CONF_DIR}"

#check to see it is specified whether to use the slaves or the
# masters file
if [ $# -gt 1 ]
then
    if [ "--hosts" = "$1" ]
    then
        shift
        slavesfile=$1
        shift
        export HADOOP_SLAVES="${HADOOP_CONF_DIR}/$slavesfile"
    fi
fi

#判断配置文件所在的目录下是否有hadoop-env.sh脚本,有就执行
if [ -f "${HADOOP_CONF_DIR}/hadoop-env.sh" ]; then
  . "${HADOOP_CONF_DIR}/hadoop-env.sh"
fi

 

 

(2)分析

    对一些变量进行赋值,这些变量有HADOOP_HOME(hadoop的安装目录),HADOOP_CONF_DIR(hadoop的配置文件目录),HADOOP_SLAVES(--slaves指定的文件的地址)。这个脚本又会执行hadoop-env.sh脚本来设置用户配置的相关环境变量。

 

第三节:hadoop-env.sh

(1)内容

# The only required environment variable is JAVA_HOME.  All others are
# optional.  When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.

# The java implementation to use.  Required.
export JAVA_HOME=/opt/java

# Extra Java CLASSPATH elements.  Optional.
# export HADOOP_CLASSPATH=

# The maximum amount of heap to use, in MB. Default is 1000.
export HADOOP_HEAPSIZE=2000

# Extra Java runtime options.  Empty by default.
# export HADOOP_OPTS=-server

# Command specific options appended to HADOOP_OPTS when specified
export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS"
export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_SECONDARYNAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS"
export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OPTS"
export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS"

 

 

(2)分析

    设置java执行的相关参数,例如JAVA_HOME变量(必须设置)、运行jvm的最大堆空间等


第四节:start-dfs.sh

(1)内容

usage="Usage: start-dfs.sh [-upgrade|-rollback]"

.....

# start dfs daemons
# start namenode after datanodes, to minimize time namenode is up w/o data
# note: datanodes will log connection errors until namenode starts
"$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR start namenode $nameStartOpt
"$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR start datanode $dataStartOpt
"$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR --hosts masters start secondarynamenode

 

 

(2)分析

 此脚本只支持upgraderollback两个选项参数,一个参数用于更新文件系统,另一个是回滚文件系统。然后启namenodedatanodesecondarynamenode节点

 

第五节:hadoop-daemon.sh

(1)分析

 

写道
HADOOP_CONF_DIR 选择配置文件目录。默认是${HADOOP_HOME}/conf。
HADOOP_LOG_DIR 存放日志文件的目录。默认是 PWD 命令产生的目录
HADOOP_MASTER host:path where hadoop code should be rsync'd from
HADOOP_PID_DIR The pid files are stored. /tmp by default.
HADOOP_IDENT_STRING A string representing this instance of hadoop. $USER by default
HADOOP_NICENESS The scheduling priority for daemons. Defaults to 0.

 
第一步:判断所带的参数是否小于1,如果小于就打印使用此脚本的使用帮助

# if no args specified, show usage
if [ $# -le 1 ]; then
  echo $usage
  exit 1
fi

 第二步:执行hadoop-config.sh设值变量,并保存启动还是停止的命令和相关参数

if [ -e "$bin/../libexec/hadoop-config.sh" ]; then
  . "$bin"/../libexec/hadoop-config.sh
else
  . "$bin/hadoop-config.sh"
fi

# get arguments
startStop=$1
shift
command=$1
shift

 
第三步:定义回滚日志函数hadoop_rotate_log

hadoop_rotate_log ()
{
    log=$1;
    num=5;
    if [ -n "$2" ]; then
	num=$2
    fi
    if [ -f "$log" ]; then # rotate logs
	while [ $num -gt 1 ]; do
	    prev=`expr $num - 1`
	    [ -f "$log.$prev" ] && mv "$log.$prev" "$log.$num"
	    num=$prev
	done
	mv "$log" "$log.$num";
    fi
}

 


第四步、执行hadoop-env.sh并设值环境变量

if [ -f "${HADOOP_CONF_DIR}/hadoop-env.sh" ]; then
  . "${HADOOP_CONF_DIR}/hadoop-env.sh"
fi

 


第五步、执行start或者stop,我们分开分析
start:
启动namenode的命令,那么首先创建存放pid文件的目录,如果存放pid的文件已经存在说明已经有namenode节点已经在运行了,那么就先停止在启动。然后根据日志滚动函数生成日志文件,最后就用nice根据调度优先级启动namenode,但是最终的启动还在另一个脚本hadoop,这个脚本是启动所有节点的终极脚本,它会选择一个带有main函数的类用java启动,这样才到达真正的启动java守护进程的效果,这个脚本是启动的重点,也是我们分析hadoop源码的入口处。
(1)验证pid的文件
 

 mkdir -p "$HADOOP_PID_DIR"

    if [ -f $pid ]; then
      if kill -0 `cat $pid` > /dev/null 2>&1; then
        echo $command running as process `cat $pid`.  Stop it first.
        exit 1
      fi
    fi

 
(2)根据日志滚动函数生成日志文件
 

 if [ "$HADOOP_MASTER" != "" ]; then
      echo rsync from $HADOOP_MASTER
      rsync -a -e ssh --delete --exclude=.svn --exclude='logs/*' --exclude='contrib/hod/logs/*' $HADOOP_MASTER/ "$HADOOP_HOME"
    fi

    hadoop_rotate_log $log
    echo starting $command, logging to $log

 
(3)nice根据调度优先级启动
  

    cd "$HADOOP_PREFIX"
    nohup nice -n $HADOOP_NICENESS "$HADOOP_PREFIX"/bin/hadoop --config $HADOOP_CONF_DIR $command "$@" > "$log" 2>&1 < /dev/null &
    echo $! > $pid

 
 注意启动后会发现在/tmp目录中出现下面的文件,这些文件保存了该进程的进程号,可用于kill操作

写道
-rw-rw-r-- 1 zhuhui zhuhui 6 3月 6 17:25 hadoop-zhuhui-datanode.pid
-rw-rw-r-- 1 zhuhui zhuhui 6 3月 6 17:25 hadoop-zhuhui-jobtracker.pid
-rw-rw-r-- 1 zhuhui zhuhui 6 3月 6 17:25 hadoop-zhuhui-namenode.pid
-rw-rw-r-- 1 zhuhui zhuhui 6 3月 6 17:25 hadoop-zhuhui-secondarynamenode.pid
-rw-rw-r-- 1 zhuhui zhuhui 6 3月 6 17:25 hadoop-zhuhui-tasktracker.pid

 
stop
   stop命令就执行简单的停止命令,执行 kill `cat $pid` 操作

if [ -f $pid ]; then
      if kill -0 `cat $pid` > /dev/null 2>&1; then
        echo stopping $command
        kill `cat $pid`
      else
        echo no $command to stop
      fi
    else
      echo no $command to stop
    fi
    ;;

 

 

第六节:hadoop-daemons.sh

(1)内容

usage="Usage: hadoop-daemons.sh [--config confdir] [--hosts hostlistfile] [start|stop] command args..."

# if no args specified, show usage
if [ $# -le 1 ]; then
  echo $usage
  exit 1
fi

bin=`dirname "$0"`
bin=`cd "$bin"; pwd`

if [ -e "$bin/../libexec/hadoop-config.sh" ]; then
  . "$bin"/../libexec/hadoop-config.sh
else
  . "$bin/hadoop-config.sh"
fi

exec "$bin/slaves.sh" --config $HADOOP_CONF_DIR cd "$HADOOP_HOME" \; "$bin/hadoop-daemon.sh" --config $HADOOP_CONF_DIR "$@"

 

(2)分析

   通过hadoop-daemon.sh 脚本来启动的,只是在这之前做了一些特殊处理,就是执行另一个脚本slaves.sh

写道
exec "$bin/slaves.sh" --config $HADOOP_CONF_DIR cd "$HADOOP_HOME" \; "$bin/hadoop-daemon.sh" --config $HADOOP_CONF_DIR "$@"

 

 

第七节:slaves.sh

(1)内容

# If the slaves file is specified in the command line,
# then it takes precedence over the definition in 
# hadoop-env.sh. Save it here.
HOSTLIST=$HADOOP_SLAVES

if [ -f "${HADOOP_CONF_DIR}/hadoop-env.sh" ]; then
  . "${HADOOP_CONF_DIR}/hadoop-env.sh"
fi

if [ "$HOSTLIST" = "" ]; then
  if [ "$HADOOP_SLAVES" = "" ]; then
    export HOSTLIST="${HADOOP_CONF_DIR}/slaves"
  else
    export HOSTLIST="${HADOOP_SLAVES}"
  fi
fi

for slave in `cat "$HOSTLIST"|sed  "s/#.*$//;/^$/d"`; do
 ssh $HADOOP_SSH_OPTS $slave $"${@// /\\ }" \
   2>&1 | sed "s/^/$slave: /" &
 if [ "$HADOOP_SLAVE_SLEEP" != "" ]; then
   sleep $HADOOP_SLAVE_SLEEP
 fi
done

wait

 

 

 

(2)分析

    首先找到所有从节点的主机名称(在slaves文件中,或者配置文件中配置有),然后通过for循环依次通过ssh远程后台运行启动脚本程序,最后等待程序完成才退出此shell脚本。
    因此这个脚本主要完成的功能就是在所有从节点执行启动相应节点的脚本。这个脚本执行datanode是从slaves文件中找到datanode节点,执行secondarynamenode是在master文件找到节点主机(在start-dfs.sh脚本中用-hosts master指定的,不然默认会找到slaves文件,datanode就是按默认找到的)

 

第八节:start-mapred.sh

(1)内容

 

"$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR start jobtracker
"$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR start tasktracker

(2)分析

  分别启动jobtracker和tasktracker节点

 

第九节:hadoop

  可以启动各个节点的服务,还能够执行很多命令和工具。它会根据传入的参数来决定执行什么样的功能(包括启动各个节点服务)

 

写道
echo "Usage: hadoop [--config confdir] COMMAND"
echo "where COMMAND is one of:"
echo " namenode -format format the DFS filesystem"
echo " secondarynamenode run the DFS secondary namenode"
echo " namenode run the DFS namenode"
echo " datanode run a DFS datanode"
echo " dfsadmin run a DFS admin client"
echo " mradmin run a Map-Reduce admin client"
echo " fsck run a DFS filesystem checking utility"
echo " fs run a generic filesystem user client"
echo " balancer run a cluster balancing utility"
echo " fetchdt fetch a delegation token from the NameNode"
echo " jobtracker run the MapReduce job Tracker node"
echo " pipes run a Pipes job"
echo " tasktracker run a MapReduce task Tracker node"
echo " historyserver run job history servers as a standalone daemon"
echo " job manipulate MapReduce jobs"
echo " queue get information regarding JobQueues"
echo " version print the version"
echo " jar <jar> run a jar file"
echo " distcp <srcurl> <desturl> copy file or directories recursively"
echo " archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive"
echo " classpath prints the class path needed to get the"
echo " Hadoop jar and the required libraries"
echo " daemonlog get/set the log level for each daemon"
echo " or"
echo " CLASSNAME run the class named CLASSNAME"
echo "Most commands print help when invoked w/o parameters."
exit 1

 

 

 第一步:执行hadoop-config.sh设值环境变量

 

if [ -e "$bin"/../libexec/hadoop-config.sh ]; then
  . "$bin"/../libexec/hadoop-config.sh
else
  . "$bin"/hadoop-config.sh
fi

 

 

 第二步:判断是否为cygwin环境

cygwin=false
case "`uname`" in
CYGWIN*) cygwin=true;;
esac

 

 第三步:参数校验

 

判断参数是否为 namenode -format、secondarynamenode、namenode、datanode、dfsadmin、mradmin、fsck 、fs、balancer、fetchdt、jobtracker、pipes、tasktracker、historyserver、job、queue、version、jar <jar> 、distcp 、archive、classpath 、daemonlog等

 

 第四步:设值java环境变量JAVA、JAVA_HEAP_MAX、CLASSPATH

 

JAVA=$JAVA_HOME/bin/java
JAVA_HEAP_MAX=-Xmx1000m 

# CLASSPATH initially contains $HADOOP_CONF_DIR
CLASSPATH="${HADOOP_CONF_DIR}"
if [ "$HADOOP_USER_CLASSPATH_FIRST" != "" ] && [ "$HADOOP_CLASSPATH" != "" ] ; then
  CLASSPATH=${CLASSPATH}:${HADOOP_CLASSPATH}
fi
CLASSPATH=${CLASSPATH}:$JAVA_HOME/lib/tools.jar

# for developers, add Hadoop classes to CLASSPATH
if [ -d "$HADOOP_HOME/build/classes" ]; then
  CLASSPATH=${CLASSPATH}:$HADOOP_HOME/build/classes
fi
if [ -d "$HADOOP_HOME/build/webapps" ]; then
  CLASSPATH=${CLASSPATH}:$HADOOP_HOME/build
fi
if [ -d "$HADOOP_HOME/build/test/classes" ]; then
  CLASSPATH=${CLASSPATH}:$HADOOP_HOME/build/test/classes
fi
if [ -d "$HADOOP_HOME/build/tools" ]; then
  CLASSPATH=${CLASSPATH}:$HADOOP_HOME/build/tools
fi

 

 

 

 第五步:设值业务CLASS执行类以及保存的命令选择对应的启动java类

# figure out which class to run
if [ "$COMMAND" = "classpath" ] ; then
  if $cygwin; then
    CLASSPATH=`cygpath -p -w "$CLASSPATH"`
  fi
  echo $CLASSPATH
  exit
elif [ "$COMMAND" = "namenode" ] ; then
  CLASS='org.apache.hadoop.hdfs.server.namenode.NameNode'
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_NAMENODE_OPTS"
elif [ "$COMMAND" = "secondarynamenode" ] ; then
  CLASS='org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode'
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_SECONDARYNAMENODE_OPTS"
elif [ "$COMMAND" = "datanode" ] ; then
  CLASS='org.apache.hadoop.hdfs.server.datanode.DataNode'
  if [ "$starting_secure_dn" = "true" ]; then
    HADOOP_OPTS="$HADOOP_OPTS -jvm server $HADOOP_DATANODE_OPTS"
  else
    HADOOP_OPTS="$HADOOP_OPTS -server $HADOOP_DATANODE_OPTS"
  fi
elif [ "$COMMAND" = "fs" ] ; then
  CLASS=org.apache.hadoop.fs.FsShell
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "dfs" ] ; then
  CLASS=org.apache.hadoop.fs.FsShell
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "dfsadmin" ] ; then
  CLASS=org.apache.hadoop.hdfs.tools.DFSAdmin
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "mradmin" ] ; then
  CLASS=org.apache.hadoop.mapred.tools.MRAdmin
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "fsck" ] ; then
  CLASS=org.apache.hadoop.hdfs.tools.DFSck
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "balancer" ] ; then
  CLASS=org.apache.hadoop.hdfs.server.balancer.Balancer
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_BALANCER_OPTS"
elif [ "$COMMAND" = "fetchdt" ] ; then
  CLASS=org.apache.hadoop.hdfs.tools.DelegationTokenFetcher
elif [ "$COMMAND" = "jobtracker" ] ; then
  CLASS=org.apache.hadoop.mapred.JobTracker
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_JOBTRACKER_OPTS"
elif [ "$COMMAND" = "historyserver" ] ; then
  CLASS=org.apache.hadoop.mapred.JobHistoryServer
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_JOB_HISTORYSERVER_OPTS"
elif [ "$COMMAND" = "tasktracker" ] ; then
  CLASS=org.apache.hadoop.mapred.TaskTracker
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_TASKTRACKER_OPTS"
elif [ "$COMMAND" = "job" ] ; then
  CLASS=org.apache.hadoop.mapred.JobClient
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "queue" ] ; then
  CLASS=org.apache.hadoop.mapred.JobQueueClient
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "pipes" ] ; then
  CLASS=org.apache.hadoop.mapred.pipes.Submitter
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "version" ] ; then
  CLASS=org.apache.hadoop.util.VersionInfo
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "jar" ] ; then
  CLASS=org.apache.hadoop.util.RunJar
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "distcp" ] ; then
  CLASS=org.apache.hadoop.tools.DistCp
  CLASSPATH=${CLASSPATH}:${TOOL_PATH}
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "daemonlog" ] ; then
  CLASS=org.apache.hadoop.log.LogLevel
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "archive" ] ; then
  CLASS=org.apache.hadoop.tools.HadoopArchives
  CLASSPATH=${CLASSPATH}:${TOOL_PATH}
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "sampler" ] ; then
  CLASS=org.apache.hadoop.mapred.lib.InputSampler
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
else
  CLASS=$COMMAND
fi

 

 

 

 第六步、jsvc或者java执行命令

 

 
  exec "$HADOOP_HOME/libexec/jsvc.${JSVC_ARCH}" -Dproc_$COMMAND -outfile "$HADOOP_LOG_DIR/jsvc.out" \
                                                -errfile "$HADOOP_LOG_DIR/jsvc.err" \
                                                -pidfile "$HADOOP_SECURE_DN_PID" \
                                                -nodetach \
                                                -user "$HADOOP_SECURE_DN_USER" \
                                                -cp "$CLASSPATH" \
                                                $JAVA_HEAP_MAX $HADOOP_OPTS \
  
  echo COMMAND=$COMMAND  
  echo HADOOP_OPTS=$HADOOP_OPTS 
  echo CLASS=$CLASS 
  exec "$JAVA" -Dproc_$COMMAND $JAVA_HEAP_MAX $HADOOP_OPTS -classpath "$CLASSPATH" $CLASS "$@"
fi

 

     
 

 

分享到:
评论

相关推荐

    hadoop-lzo-0.4.21-SNAPSHOT jars

    3. `hadoop-lzo-0.4.21-SNAPSHOT-sources.jar`:这个文件包含了Hadoop-LZO的源代码,对于开发者来说非常有用,因为可以直接查看源码来理解其内部工作原理,也可以方便地进行二次开发或调试。 集成Hadoop-LZO到你的...

    hadoop-yarn-server-common-2.6.5-API文档-中文版.zip

    赠送jar包:hadoop-yarn-server-common-2.6.5.jar; 赠送原API文档:hadoop-yarn-server-common-2.6.5-javadoc.jar; 赠送源代码:hadoop-yarn-server-common-2.6.5-sources.jar; 赠送Maven依赖信息文件:hadoop-...

    flink-shaded-hadoop-2-uber-2.7.2-10.0.jar

    Flink1.10.1编译hadoop2.7.2 编译flink-shaded-hadoop-2-uber

    hadoop-eclipse-plugin-2.7.0.jar

    - `lib`:包含插件运行所需的第三方库文件,如Hadoop的相关JAR包。 - `META-INF`:存储插件的元数据信息,如MANIFEST.MF文件,描述了插件的基本信息和依赖。 - `resources`:包含插件的资源文件,如图标、帮助文档...

    hadoop插件apache-hadoop-3.1.0-winutils-master.zip

    标题中的"apache-hadoop-3.1.0-winutils-master.zip"是一个针对Windows用户的Hadoop工具包,它包含了运行Hadoop所需的特定于Windows的工具和配置。`winutils.exe`是这个工具包的关键组件,它是Hadoop在Windows上的一...

    Hadoop权威指南----读书笔记.pdf

    Hadoop权威指南----读书笔记

    hadoop最新版本3.1.1全量jar包

    hadoop-annotations-3.1.1.jar hadoop-common-3.1.1.jar hadoop-mapreduce-client-core-3.1.1.jar hadoop-yarn-api-3.1.1.jar hadoop-auth-3.1.1.jar hadoop-hdfs-3.1.1.jar hadoop-mapreduce-client-hs-3.1.1.jar ...

    hadoop-mapreduce-examples-2.6.5.jar

    hadoop-mapreduce-examples-2.6.5.jar 官方案例源码

    hadoop-eclipse-plugin-3.1.1.tar.gz

    Hadoop-Eclipse-Plugin-3.1.1是一款专为Eclipse集成开发环境设计的插件,用于方便地在Hadoop分布式文件系统(HDFS)上进行开发和调试MapReduce程序。这款插件是Hadoop生态系统的组成部分,它使得Java开发者能够更加...

    hadoop-core-0.20.2 源码 hadoop-2.5.1-src.tar.gz 源码 hadoop 源码

    这里我们将深入探讨"Hadoop-core-0.20.2"和"hadoop-2.5.1-src"的源码,以便更好地理解Hadoop的工作原理和内部机制。 **Hadoop Core源码分析** Hadoop-core-0.20.2是Hadoop早期版本的核心组件,它包含了Hadoop的...

    hadoop-3.3.1 windows + apache-hadoop-3.1.0-winutils-master.zip

    1. **下载并解压**:首先,你需要下载hadoop-3.3.1的Windows版本和winutils工具包,并将它们解压到合适的目录,例如`C:\Hadoop`。 2. **配置环境变量**:打开系统环境变量设置,添加新的系统变量`HADOOP_HOME`,...

    hadoop-common-2.7.1-bin-maste

    在"**hadoop-common-2.7.1-bin-maste**"压缩包中,主要包含以下文件和目录: - bin:存放可执行脚本,如hadoop命令,用于执行Hadoop相关的操作。 - sbin:存放系统级脚本,如start-dfs.sh、stop-dfs.sh,用于启动和...

    flink-shaded-hadoop-3-uber-3.1.1.7.1.1.0-565-9.0.jar

    Flink-1.11.2与Hadoop3集成JAR包,放到flink安装包的lib目录下,可以避免Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Hadoop is not in the classpath/dependencies.这个报错,实现...

    hadoop-eclipse-plugin-3.1.2.jar

    在eclipse中搭建hadoop环境,需要安装hadoop-eclipse-pulgin的插件,根据hadoop的版本对应jar包的版本,此为hadoop3.1.2版本的插件。

    编译hadoophadoop-3.2.2-src源码

    编译hadoophadoop-3.2.2-src的源码

    hadoop-common-2.7.3-bin-master

    - **错误日志分析**:当遇到问题时,检查Hadoop的日志文件,如`logs/hadoop-root-namenode-localhost.out`和`logs/hadoop-root-datanode-localhost.out`,它们会提供错误信息帮助解决问题。 - **防火墙配置**:...

    hadoop-eclipse-plugin三个版本的插件都在这里了。

    hadoop-eclipse-plugin-2.7.4.jar和hadoop-eclipse-plugin-2.7.3.jar还有hadoop-eclipse-plugin-2.6.0.jar的插件都在这打包了,都可以用。

    hadoop-common-2.6.0-master.zip

    hadoop-common-2.6.0-bin-master目录包含了运行Hadoop所需的所有二进制文件和脚本,包括配置文件、可执行程序和库。安装时,你需要将这些文件解压到合适的目录,并根据你的环境配置相应的环境变量,如HADOOP_HOME和...

    hadoop-common-2.7.5-bin-master

    【标题】"hadoop-common-2.7.5-bin-master" 指的是Apache Hadoop项目中的一个组件——Hadoop Common的2.7.5版本的二进制发行版。这个版本是专为Eclipse开发环境设计的,包含了运行Hadoop在Windows系统上的必要文件。...

Global site tag (gtag.js) - Google Analytics