大数据这两年也是大热,自己虽然在工作方面没什么交集,但也通过公司的一些资源对这方面进行了了解,包括hadoop、spark的部署及简单用例,以下部分内容都是通过公司服务器实现了操作,部署hadoop的大致流程有下面几个步骤:
Setup passphraseless ssh
Now check that you can ssh to the localhost without a passphrase:
$ ssh localhost
If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
接着在master上生成密钥并配置SSH无密码登录
具体步骤如下:
1、 进入.ssh文件夹
2、 ssh-keygen -t rsa 之后一路回车(产生秘钥)
3、 把id_rsa.pub 追加到授权的 key 里面去(cat id_rsa.pub >> authorized_keys)
4、 重启 SSH 服务命令使其生效
Hadoop configuration is driven by two types of important configuration files:
- Read-only default configuration -
core-default.xml
,hdfs-default.xml
,yarn-default.xml
andmapred-default.xml
. - Site-specific configuration - conf/core-site.xml, conf/hdfs-site.xml, conf/yarn-site.xml andconf/mapred-site.xml.
Configuration
Use the following:
etc/hadoop/core-site.xml:
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration>
etc/hadoop/hdfs-site.xml:
<configuration> <property> <name>dfs.replication</name> <value>3</value> </property> </configuration>
Hadoop Startup
To start a Hadoop cluster you will need to start both the HDFS and YARN cluster.
Format a new distributed filesystem:
$ $HADOOP_PREFIX/bin/hdfs namenode -format <cluster_name>
Start the HDFS with the following command, run on the designated NameNode:
$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start namenode
Run a script to start DataNodes on all slaves:
$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start datanode
Start the YARN with the following command, run on the designated ResourceManager:
$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager
Run a script to start NodeManagers on all slaves:
$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start nodemanager
Start a standalone WebAppProxy server. If multiple servers are used with load balancing it should be run on each of them:
$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh start proxyserver --config $HADOOP_CONF_DIR
Start the MapReduce JobHistory Server with the following command, run on the designated server:
$ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh start historyserver --config $HADOOP_CONF_DIR
Hadoop Shutdown
Stop the NameNode with the following command, run on the designated NameNode:
$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop namenode
Run a script to stop DataNodes on all slaves:
$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop datanode
Stop the ResourceManager with the following command, run on the designated ResourceManager:
$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop resourcemanager
Run a script to stop NodeManagers on all slaves:
$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop nodemanager
Stop the WebAppProxy server. If multiple servers are used with load balancing it should be run on each of them:
$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh stop proxyserver --config $HADOOP_CONF_DIR
Stop the MapReduce JobHistory Server with the following command, run on the designated server:
$ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh stop historyserver --config $HADOOP_CONF_DIR
Operating the Hadoop Cluster
Once all the necessary configuration is complete, distribute the files to the HADOOP_CONF_DIR
directory on all the machines.
This section also describes the various Unix users who should be starting the various components and uses the same Unix accounts and groups used previously:
集群方式启动HDFS:
hadoop-2.2.0$ sbin/start-dfs.sh
Hadoop Startup
To start a Hadoop cluster you will need to start both the HDFS and YARN cluster.
Format a new distributed filesystem as hdfs:
[hdfs]$ $HADOOP_PREFIX/bin/hdfs namenode -format <cluster_name>
Start the HDFS with the following command, run on the designated NameNode as hdfs:
[hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start namenode
Run a script to start DataNodes on all slaves as root with a special environment variable HADOOP_SECURE_DN_USER
set to hdfs:
[root]$ HADOOP_SECURE_DN_USER=hdfs $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start datanode
Start the YARN with the following command, run on the designated ResourceManager as yarn:
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager
Run a script to start NodeManagers on all slaves as yarn:
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start nodemanager
Start a standalone WebAppProxy server. Run on the WebAppProxy server as yarn. If multiple servers are used with load balancing it should be run on each of them:
[yarn]$ $HADOOP_YARN_HOME/bin/yarn start proxyserver --config $HADOOP_CONF_DIR
Start the MapReduce JobHistory Server with the following command, run on the designated server as mapred:
[mapred]$ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh start historyserver --config $HADOOP_CONF_DIR
Hadoop Shutdown
Stop the NameNode with the following command, run on the designated NameNode as hdfs:
[hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop namenode
Run a script to stop DataNodes on all slaves as root:
[root]$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop datanode
Stop the ResourceManager with the following command, run on the designated ResourceManager as yarn:
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop resourcemanager
Run a script to stop NodeManagers on all slaves as yarn:
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop nodemanager
Stop the WebAppProxy server. Run on the WebAppProxy server as yarn. If multiple servers are used with load balancing it should be run on each of them:
[yarn]$ $HADOOP_YARN_HOME/bin/yarn stop proxyserver --config $HADOOP_CONF_DIR
Stop the MapReduce JobHistory Server with the following command, run on the designated server as mapred:
[mapred]$ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh stop historyserver --config $HADOOP_CONF_DIR
Web Interfaces
Once the Hadoop cluster is up and running check the web-ui of the components as described below:
NameNode | http://nn_host:port/ | Default HTTP port is 50070. |
ResourceManager | http://rm_host:port/ | Default HTTP port is 8088. |
MapReduce JobHistory Server | http://jhs_host:port/ | Default HTTP port is 19888. |
将hadoop.tmp.dir所指定的目录删除。
(3)重新执行命令:hadoop namenode -format
相关推荐
资源名称:云计算Hadoop:快速部署Hadoop集群内容简介: 近来云计算越来越热门了,云计算已经被看作IT业的新趋势。云计算可以粗略地定义为使用自己环境之外的某一服务提供的可伸缩计算资源,并按使用量付费。可以...
基于Kubernetes平台部署Hadoop实践 本文介绍了如何在Kubernetes平台上部署Hadoop,解决了Hadoop在Kubernetes上的部署问题。Hadoop和Kubernetes是两个不同的技术领域,former是传统的大数据领域,later是新兴的容器...
最近要在公司里搭建一个hadoop测试集群,于是采用docker来快速部署hadoop集群。 0. 写在前面 网上也已经有很多教程了,但是其中都有不少坑,在此记录一下自己安装的过程。 目标:使用docker搭建一个一主两从三台机器...
部署Hadoop3.0高性能集群,Hadoop完全分布式模式: Hadoop的守护进程分别运行在由多个主机搭建的集群上,不同 节点担任不同的角色,在实际工作应用开发中,通常使用该模式构建企业级Hadoop系统。 在Hadoop环境中,所有...
教程:在linux虚拟机下(centos),通过docker容器,部署hadoop集群。一个master节点和三个slave节点。
docker部署hadoop资源包.txtdocker部署hadoop资源包.txtdocker部署hadoop资源包.txtdocker部署hadoop资源包.txtdocker部署hadoop资源包.txtdocker部署hadoop资源包.txtdocker部署hadoop资源包.txtdocker部署hadoop...
在使用Ambari部署Hadoop集群时,Ambari会引导用户完成各个组件的配置,包括网络设置、安全选项、存储布局等。通过Ambari,用户可以灵活地调整集群配置,满足不同业务需求。集群搭建完成后,Ambari将继续提供实时监控...
教案27 项目8 平台化快速部署Hadoop 第1部分 探寻大数据平台及基础环境配置.pdf教案27 项目8 平台化快速部署Hadoop 第1部分 探寻大数据平台及基础环境配置.pdf教案27 项目8 平台化快速部署Hadoop 第1部分 探寻大数据...
在Windows 64位系统上部署Hadoop是一项技术性较强的任务,尤其对于初学者来说可能会遇到不少挑战。这里,我们重点关注两个关键文件:hadoop.dll和winutils.exe,它们是Windows环境下成功安装和运行Hadoop所必需的...
【部署HADOOP的实现过程】涉及的知识点主要包括HADOOP集群的安装、用户管理、网络配置以及SSH免密码登录的设置。以下是这些知识点的详细解释: 1. **HADOOP集群安装部署**: HADOOP是一个分布式计算框架,用于处理...
在 Windows 平台下部署 Hadoop 开发环境 Hadoop 是一个开源的可运行于大规模集群上的分布式并行编程框架,由于分布式存储对于分布式编程来说是必不可少的,这个框架中还包含了一个分布式文件系统 HDFS( Hadoop ...
在Windows环境下部署Hadoop是一项需要细致操作的任务,尤其对于初学者来说可能会遇到一些挑战。这里提供的文件"winutils.exe"和"hadoop.dll"是Hadoop在Windows上运行的关键组件,虽然"winutils.exe"并不是绝对必要的...
本教程将聚焦于如何在虚拟化集群中部署Hadoop,这对于学习、测试和小型项目实施非常实用,同时也为大规模生产环境提供了基础。 首先,我们需要理解虚拟化集群的概念。虚拟化集群是通过虚拟化技术将多台物理服务器...
【Ubuntu部署Hadoop 0.20.2简要指南】是针对在Ubuntu 10.10系统上安装和配置Hadoop 0.20.204.0版本的详细步骤。以下是对该指南内容的详细解释: 1. **Java安装与环境配置**: 在部署Hadoop之前,需要先确保系统中...
【HA模式部署Hadoop】是高可用(High Availability)部署方式,它确保了Hadoop集群在主NameNode故障时能够无缝切换到备用NameNode,从而避免服务中断。在这个场景下,我们将讨论如何处理Hadoop日志管理和HDFS的高级...
本文将详细讲解如何在Win10系统下部署Hadoop,并着重阐述安装过程中涉及的重要步骤以及如何处理Hadoop相关的jar文件,尤其是与JDK路径配置相关的部分。 首先,你需要下载Hadoop的Windows版本。Hadoop通常被设计为在...