`

hadoop源码阅读-shell启动流程

 
阅读更多

open the bin/hadoop file,you will see the there is a config file to load:

  either libexec/hadoop-config.sh or bin/hadoop-config.sh

and the previor is loaded if exists,else the load the later.

 

you will see the HADOOP_HOME is same as HADOOP_PREFIX at last:

export HADOOP_HOME=${HADOOP_PREFIX}

 

ok,now start to have a grance of shell-starting flow of distributed-mode:

  namenode format -> start-dfs -> start-mapred

 

step 1-namenode format

 appropriate cmd is: "hadoop namenode -format",and the related class entry is:

 org.apache.hadoop.hdfs.server.namenode.NameNode

 

well ,what is NameNode(NN) responsible for? description copied from code:

 * NameNode serves as both directory namespace manager and
 * "inode table " for the Hadoop DFS.  There is a single NameNode
 * running in any DFS deployment.  (Well, except when there
 * is a second backup/failover NameNode.)
 *
 * The NameNode controls two critical tables:
 *   1)  filename->blocksequence (namespace )
 *   2)  block->machinelist ("inodes ")
 *
 * The first table is stored on disk and is very precious.
 * The second table is rebuilt every time the NameNode comes
 * up.
 *
 * 'NameNode' refers to both this class as well as the 'NameNode server'.
 * The 'FSNamesystem' class actually performs most of the filesystem
 * management.  The majority of the 'NameNode' class itself is concerned
 * with exposing the IPC interface and the http server to the outside world,
 * plus some configuration management.
 *
 * NameNode implements the ClientProtocol interface, which allows
 * clients to ask for DFS services.  ClientProtocol is not
 * designed for direct use by authors of DFS client code.  End-users
 * should instead use the org.apache.nutch.hadoop.fs.FileSystem class.
 *
 * NameNode also implements the DatanodeProtocol interface, used by
 * DataNode programs that actually store DFS data blocks.  These
 * methods are invoked repeatedly and automatically by all the
 * DataNodes in a DFS deployment.
 *
 * NameNode also implements the NamenodeProtocol interface, used by
 * secondary namenodes or rebalancing processes to get partial namenode's
 * state, for example partial blocksMap etc.

 

 

the formated files list are here:

hadoop@leibnitz-laptop:/cc$ ll data/hadoop/hadoop-1.0.1/cluster-hadoop/mapred/local/


hadoop@leibnitz-laptop:/cc$ ll data/hadoop/hadoop-1.0.1/cluster-hadoop/dfs/name/current/
-rw-r--r-- 1 hadoop hadoop    4 2012-05-01 15:41 edits
-rw-r--r-- 1 hadoop hadoop 2474 2012-05-01 15:41 fsimage
-rw-r--r-- 1 hadoop hadoop    8 2012-05-01 15:41 fstime
-rw-r--r-- 1 hadoop hadoop  100 2012-05-01 15:41 VERSION

 

 

hadoop@leibnitz-laptop:/cc$ ll data/hadoop/hadoop-1.0.1/cluster-hadoop/dfs/name/image/
-rw-r--r-- 1 hadoop hadoop  157 2012-05-01 15:41 fsimage

 

 

ok.let's to see what does these files to keep.

edits: FSEditLog maintains a log of the namespace modifications .(same as transactional logs)

(these files belong to FSImage listed below)

fsimage : FSImage handles checkpointing and logging of the namespace edits .

fstime : keep last checkpoint time

VERSION: File VERSION contains the following fields:

  1. node type
  2. layout version
  3. namespaceID
  4. fs state creation time
  5. other fields specific for this node type

   The version file is always written last during storage directory updates. The existence of the version file indicates that     all other files have been successfully written in the storage directory, the storage is valid and does not need to be     recovered.

 

a dir named 'previous.checkpoint ' wil be occured when :

     * previous.checkpoint is a directory, which holds the previous
     * (before the last save) state of the storage directory
.
     * The directory is created as a reference only, it does not play role
     * in state recovery procedures, and is recycled automatically,

     * but it may be useful for manual recovery of a stale state of the system.

content like this:
hadoop@leibnitz-laptop:/cc$ ll data/hadoop/hadoop-1.0.1/cluster-hadoop/dfs/name/previous.checkpoint/
-rw-r--r-- 1 hadoop hadoop  293 2012-04-25 02:26 edits
-rw-r--r-- 1 hadoop hadoop 2934 2012-04-25 02:26 fsimage
-rw-r--r-- 1 hadoop hadoop    8 2012-04-25 02:26 fstime
-rw-r--r-- 1 hadoop hadoop  100 2012-04-25 02:26 VERSION

 

 

yes, i found a import class named "Lease" which will do as:

A Lease governs all the locks held by a single client.
   * For each client there's a corresponding lease , whose
   * timestamp is updated when the client periodically
   * checks in.  If the client dies and allows its lease to
   * expire, all the corresponding locks can be released.

 

 

 

分享到:
评论

相关推荐

    hadoop-3.0.0-winUtils.zip

    hadoop3.0.0版本 winUtils 。如果本机操作系统是 Windows,在程序中使用了 Hadoop 相关的东西,比如写入文件到HDFS,则会遇到如下异常:could not locate executable null\bin\winutils.exe ,使用这个包,设置一个 ...

    hadoop-common-2.2.0-bin-master

    描述中提到,Hadoop是由Apache基金会开发的,这表明它是开放源码的,允许全球的开发者参与其发展并根据自己的需求进行定制。Apache Hadoop的创建是为了应对大规模数据处理的挑战,它借鉴了Google的GFS(Google File ...

    hadoop-core-1.2.2-SNAPSHOT.jar

    windows下搭建nutch会遇到Hadoop下FileUtil.java问题,所以我们一般的做法是找到Hadoop-core-1.2.0源码中的org.apache.hadoop.fs下的FileUtil.java修改其中的CheckReturnValue方法,注释掉其中的内容这时运行会遇到...

    hadoop-3.1.3.tar.gz编译后的源码包

    当你解压hadoop-3.1.3.tar.gz后,你可以通过阅读源码来学习Hadoop如何实现分布式文件系统和计算。例如,你可以深入到HDFS的源码中,了解NameNode如何维护文件系统的元数据,DataNode如何存储和传输数据块,以及如何...

    hadoop2.6.3源码包

    5. **Hadoop Shell命令**:源码包中还包括了对Hadoop命令行工具的实现,如`bin/hadoop`脚本,可以学习如何与Hadoop集群交互。 6. **Hadoop生态组件**:Hadoop生态系统还包括其他组件,如HBase、Hive、Pig等,它们与...

    hadoop-2.6.4 源码

    5. **启动和测试**:使用生成的可执行文件启动Hadoop服务,并通过fsShell或其他工具进行简单的数据操作,验证Hadoop是否正确安装和配置。 学习Hadoop-2.6.4源码可以帮助开发者深入理解分布式系统的设计原则,提升在...

    spark-1.6.0-bin-hadoop2.6.tgz

    在命令行中输入 `bin/spark-shell` 即可启动Scala Shell,或者使用 `bin/pyspark` 启动Python Shell。 **5. Spark应用程序开发** Spark支持多种编程语言,包括Scala、Java、Python和R。开发者可以根据需求选择合适...

    hadoop-2.7.6.tar.zip

    在本例中,“hadoop-2.7.6.tar.gz”是一个包含了Hadoop源码或二进制的压缩文件。它首先是Gzip(.gz)压缩,然后是一个TAR(.tar)归档文件。用户需要先用gunzip命令解压Gzip,再用tar命令提取TAR文件,才能得到...

    spark-3.1.3-bin-hadoop3.2.tgz

    在解压并安装"spark-3.1.3-bin-hadoop3.2.tgz"后,你需要配置环境变量,如SPARK_HOME,然后可以通过启动Master和Worker节点来建立Spark集群。对于单机测试,可以使用本地模式。使用Spark时,你可以编写Python、Scala...

    基于Python的大数据Hadoop平台2-2、MapReduce.zip

    开发方式:shell、vim、IDE(idea) 项目:推荐系统----模板,融会贯通(检索、反作弊、预测) 重点:架构思维,思考方式,解决方法。 在正式介绍MR之前,先铺垫一些Hadoop生态圈组件,如图所示,这些组件从下到上看,...

    Hadoop-Eclipse开发环境配置经验

    在介绍Hadoop-Eclipse开发环境配置之前,我们首先要了解Hadoop和Eclipse的基本概念。Hadoop是一个由Apache基金会开发的开源框架,能够支持在普通硬件上运行的分布式应用。它旨在从单一服务器扩展到数千台机器上,...

    spark-2.4.7-bin-hadoop2.6.tgz

    - `bin/`:包含可执行文件,如`spark-submit`,`pyspark`,`spark-shell`等,用于启动和管理Spark作业。 - `conf/`:存放配置文件,如`spark-defaults.conf`,用户可以在此自定义Spark的默认配置。 - `jars/`:包含...

    hadoop shell操作与程式开发

    【标题】"Hadoop Shell操作与程序开发"涵盖了在分布式计算环境Hadoop中进行命令行交互和编写应用程序的核心概念。Hadoop是一个开源框架,专为处理和存储大量数据而设计,它利用分布式文件系统(HDFS)和MapReduce...

    新版Hadoop视频教程 段海涛老师Hadoop八天完全攻克Hadoop视频教程 Hadoop开发

    第一天 hadoop的基本概念 伪分布式hadoop集群安装 hdfs mapreduce 演示 01-hadoop职位需求状况.avi 02-hadoop课程安排.avi 03-hadoop应用场景.avi 04-hadoop对海量数据处理的解决思路.avi 05-hadoop版本选择和...

    hbase-0.98.12.1-hadoop2-bin.tar.gz

    1. 下载:首先,从Apache官网下载HBase 0.98.12.1的源码或二进制包,例如文件名为“hbase-0.98.12.1-hadoop2-bin.tar.gz”。 2. 解压:使用`tar -zxvf hbase-0.98.12.1-hadoop2-bin.tar.gz`命令解压。 3. 配置:...

    [整理]Centos6.5 + hadoop2.6.4环境搭建

    启动Hadoop服务: ```bash sudo -u hadoop /usr/local/hadoop/sbin/start-dfs.sh sudo -u hadoop /usr/local/hadoop/sbin/start-yarn.sh ``` 验证Hadoop是否运行正常,可以通过Web界面检查NameNode和...

    单机伪分布hadoop-spark配置_Spark!_spark_spark配置_hadoop_

    4. **启动Spark Shell或应用**: 可以通过`spark-shell`或`pyspark`命令启动交互式Shell,或者直接运行Spark应用。 单机伪分布式配置的关键在于正确地配置这些文件,使得Hadoop和Spark能够识别彼此并正常通信。在...

    图解hadoop环境的搭建(5)

    在本篇【图解Hadoop环境的搭建(5)】中,我们将深入探讨Hadoop分布式文件系统(HDFS)的安装、配置以及如何通过Shell命令进行操作。Hadoop是Apache软件基金会开发的一个开源框架,主要用于处理和存储大量数据,特别...

    hadoop段海涛老师八天实战视频

    第一天 hadoop的基本概念 伪分布式hadoop集群安装 hdfs mapreduce 演示 01-hadoop职位需求状况.avi 02-hadoop课程安排.avi 03-hadoop应用场景.avi 04-hadoop对海量数据处理的解决思路.avi 05-hadoop版本选择和...

    hadoop模块编译与日志调试.doc|hadoop模块编译与日志调试.doc

    编译Hadoop源码,可以在源码根目录下执行`mvn package -Psrc -DskipTests`命令。`-Psrc`参数指示Maven构建源代码包,`-DskipTests`则跳过单元测试,以加快编译速度。 - **模块替换**:编译完成后,每个模块会在其...

Global site tag (gtag.js) - Google Analytics