- 浏览: 167970 次
文章分类
最新评论
-
西巴拉古呀那:
基于Spring Boot框架企业级应用系统开发全面实战网盘地 ...
使用 Spring Boot 快速构建 Spring 框架应用 -
小灯笼:
基于Spring Boot框架企业级应用系统开发全面实战网盘地 ...
使用 Spring Boot 快速构建 Spring 框架应用 -
ximeng1234:
spark spark-1.6.1-bin-hadoop2.6 ...
Spark On YARN 环境搭建
==================================================================================
一、基础环境
==================================================================================
1、服务器分布
192.168.10.84 主名字节点
192.168.10.84 备名字节点
192.168.10.83 数据节点1
192.168.10.85 数据节点2
----------------------------------------------------------------------------------
2、HOSTS 设置
在每台服务器的“/etc/hosts”文件
192.168.10.83 192-168-10-83
192.168.10.84 192-168-10-84
192.168.10.85 192-168-10-85
----------------------------------------------------------------------------------
3、SSH 免密码登录
可参考文章:
http://blog.csdn.net/codepeak/article/details/14447627
==================================================================================
二、Hadoop 2.6.2 编译安装【官方提供的二进制版本native java为32位版本,64位环境需重新编译】
==================================================================================
1、JDK 安装
http://download.oracle.com/otn-pub/java/jdk/7u45-b18/jdk-7u45-linux-x64.tar.gz
# tar xvzf jdk-7u45-linux-x64.tar.gz -C /usr/local
# cd /usr/local
# ln -s jdk1.7.0_45 jdk
# vim /etc/profile
export JAVA_HOME=/usr/local/java
export CLASS_PATH=$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export PATH=$PATH:$JAVA_HOME/bin
# source /etc/profile
----------------------------------------------------------------------------------
2、MAVEN 安装
http://mirror.bit.edu.cn/apache/maven/maven-3/3.1.1/binaries/apache-maven-3.1.1-bin.tar.gz
# tar xvzf apache-maven-3.1.1-bin.tar.gz -C /usr/local
# cd /usr/local
# ln -s apache-maven-3.1.1 maven
# vim /etc/profile
export MAVEN_HOME=/usr/local/maven
export PATH=$PATH:$MAVEN_HOME/bin
# source /etc/profile
# mvn -v
3、PROTOBUF 安装
https://protobuf.googlecode.com/files/protobuf-2.5.0.tar.gz
# tar xvzf protobuf-2.5.0.tar.gz
# ./configure --prefix=/usr/local/protobuf
# make && make install
# vim /etc/profile
export PROTO_HOME=/usr/local/protobuf
export PATH=$PATH:$PROTO_HOME/bin
# source /etc/profile
# vim /etc/ld.so.conf
/usr/local/protobuf/lib
# /sbin/ldconfig
4、其他依赖库安装
http://www.cmake.org/files/v2.8/cmake-2.8.12.1.tar.gz
http://ftp.gnu.org/pub/gnu/ncurses/ncurses-5.9.tar.gz
http://www.openssl.org/source/openssl-1.0.1e.tar.gz
# tar xvzf cmake-2.8.12.1.tar.gz
# cd cmake-2.8.12.1
# ./bootstrap --prefix=/usr/local
# gmake && gmake install
# tar xvzf ncurses-5.9.tar.gz
# cd ncurses-5.9
# ./configure --prefix=/usr/local
# make && make install
# tar xvzf openssl-1.0.1e.tar.gz
# cd openssl-1.0.1e
# ./config shared --prefix=/usr/local
# make && make install
# /sbin/ldconfig
----------------------------------------------------------------------------------
5、编译 Hadoop
http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-2.6.2/hadoop-2.6.2-src.tar.gz
(1)、maven源设置【在<mirrors></mirros>里添加】
# vim /usr/local/maven/conf/settings.xml
<mirror>
<id>nexus-osc</id>
<mirrorOf>*</mirrorOf>
<name>Nexusosc</name>
<url>http://maven.oschina.net/content/groups/public/</url>
</mirror>
(2)、编译Hadoop
# tar xvzf hadoop-2.6.2-src.tar.gz
# cd hadoop-2.6.2-src
# mvn clean install -DskipTests
# mvn package -Pdist,native -DskipTests -Dtar
## 编译成功后,生成的二进制包所在路径
hadoop-dist/target/hadoop-2.6.2
创建hadoop用户
useradd hadoop
# cp -a hadoop-dist/target/hadoop-2.6.2 /home/hadoop/source
# cd /home/hadoop
# ln -s /home/hadoop/source/hadoop-2.6.2 ./hadoop
【注意:编译过程中,可能会失败,需要多尝试几次】
========================================================================================
三、Hadoop YARN 分布式集群配置【注:所有节点都做同样配置】
========================================================================================
1、环境变量设置
# vim /etc/profile 或者 .bash_profile
export HADOOP_DEV_HOME=/home/hadoop/hadoop
export PATH=$PATH:$HADOOP_DEV_HOME/bin
export PATH=$PATH:$HADOOP_DEV_HOME/sbin
export HADOOP_MAPARED_HOME=${HADOOP_DEV_HOME}
export HADOOP_COMMON_HOME=${HADOOP_DEV_HOME}
export HADOOP_HDFS_HOME=${HADOOP_DEV_HOME}
export YARN_HOME=${HADOOP_DEV_HOME}
export HADOOP_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
export HDFS_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
export YARN_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
export SPARK_HOME=/home/hadoop/spark
export PATH=$PATH:$SPARK_HOME/bin
export PATH
# source /etc/profile
3、配置 core-site.xml
# vim /home/hadoop/hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.10.84:9000</value>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>192.168.10.84</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
</configuration>
4、配置 hdfs-site.xml
# vim /home/hadoop/hadoop/etc/hadoop/hdfs-site.xml
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/hadoop/hdfs/name</value>
<final>true</final>
</property>
<property>
<name>dfs.federation.nameservice.id</name>
<value>ns1</value>
</property>
<property>
<name>dfs.namenode.backup.address.ns1</name>
<value>192.168.10.84:50100</value>
</property>
<property>
<name>dfs.namenode.backup.http-address.ns1</name>
<value>192.168.10.84:50105</value>
</property>
<property>
<name>dfs.federation.nameservices</name>
<value>ns1</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns1</name>
<value>192.168.10.84:9000</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns2</name>
<value>192.168.10.84:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.ns1</name>
<value>192.168.10.84:23001</value>
</property>
<property>
<name>dfs.namenode.http-address.ns2</name>
<value>192.168.10.84:13001</value>
</property>
<property>
<name>dfs.dataname.data.dir</name>
<value>file:/hadoop/hdfs/data</value>
<final>true</final>
</property>
<property>
<name>dfs.namenode.secondary.http-address.ns1</name>
<value>192.168.10.84:23002</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address.ns2</name>
<value>192.168.10.84:23002</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address.ns1</name>
<value>192.168.10.84:23003</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address.ns2</name>
<value>192.168.10.84:23003</value>
</property>
------------------------------------------------------------------------------------------------------------------------------------------
6、配置 yarn-site.xml
# vim /home/hadoop/hadoop/etc/hadoop/yarn-site.xml
<property>
<name>yarn.resourcemanager.address</name>
<value>192.168.10.84:18040</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>192.168.10.84:18030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>192.168.10.84:18088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>192.168.10.84:18025</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>192.168.10.84:18141</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
7、配置 hadoop-env.sh、mapred-env.sh、yarn-env.sh【在开头添加】
文件路径:
/home/hadoop/hadoop/etc/hadoop/hadoop-env.sh
/home/hadoop/hadoop/etc/hadoop/mapred-env.sh
/home/hadoop/hadoop/etc/hadoop/yarn-env.sh
添加内容:
export JAVA_HOME=/usr/local/java
export CLASS_PATH=$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_PID_DIR=/home/hadoop/data/hadoop/pids
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_PREFIX=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HDFS_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
------------------------------------------------------------------------------------------------------------------------------------------
8、数据节点配置
# vim /home/hadoop/hadoop/etc/hadoop/slaves
192.168.10.83
192.168.10.84
192.168.10.85
------------------------------------------------------------------------------------------------------------------------------------------
9、Hadoop 简单测试
# cd /home/hadoop/hadoop
## 首次启动集群时,做如下操作【主名字节点上执行】
# hdfs namenode -format
# sbin/start-dfs.sh
## 检查进程是否正常启动
# jps
主名字节点:
NodeManager
备名字节点:
SecondaryNameNode
数据节点:
DataNode
## hdfs与mapreduce测试
# hdfs dfs -mkdir -p /user/test
# hdfs dfs -put bin/hdfs.cmd /user/test
# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /user/test /user/out
# hdfs dfs -ls /user/out
## hdfs信息查看
# hdfs dfsadmin -report
# hdfs fsck / -files -blocks
## 集群的后续维护
# sbin/start-all.sh
# sbin/stop-all.sh
## 监控页面URL
http://192.168.10.84:80/
========================================================================================
四、Spark 分布式集群配置【注:所有节点都做同样配置】
========================================================================================
1、Scala 安装
http://www.scala-lang.org/files/archive/scala-2.9.3.tgz
# tar xvzf scala-2.9.3.tgz -C /usr/local
# cd /usr/local
# ln -s scala-2.9.3 scala
# vim /etc/profile
export SCALA_HOME=/usr/local/scala
export PATH=$PATH:$SCALA_HOME/bin
export SPARK_CLASSPATH=/home/hadoop/spark/lib
# source /etc/profile
------------------------------------------------------------------------------------------------------------------------------------------
2、Spark 安装
http://d3kbcqa49mib13.cloudfront.net/spark-1.6.1-incubating-bin-hadoop2.6tgz
# tar xvzf spark-1.6.1-incubating-bin-hadoop2.6.tgz -C /home/hadoop/source
# cd /home/hadoop
# ln -s spark-1.6.1-incubating-bin-hadoop2.6 ./spark
# vim /etc/profile
export SPARK_HOME=/home/hadoop/spark
export PATH=$PATH:$SPARK_HOME/bin
# source /etc/profile
# cd /home/hadoop/spark/conf
# mv spark-env.sh.template spark-env.sh
# vim spark-env.sh
export JAVA_HOME=/usr/local/java
export SCALA_HOME=/usr/local/scala
export HADOOP_HOME=/home/hadoop/hadoop
## worker节点的主机名列表
# vim slaves
192.168.10.83
192.168.10.84
192.168.10.85
# mv log4j.properties.template log4j.properties
## 在Master节点上执行
# .sbin/start-all.sh
## 检查进程是否启动【在master节点上出现“Master”,在slave节点上出现“Worker”】
# jps
Master节点:
Master
Slave节点:
worker
3、相关测试
## 监控页面URL
http://192.168.10.84:8080/
## 先切换到“/home/hadoop/spark”目录
(1)、本地模式
# ./run-example org.apache.spark.examples.SparkPi local
(2)、普通集群模式
# ./run-example org.apache.spark.examples.SparkPi spark://namenode1:7077
# ./run-example org.apache.spark.examples.SparkLR spark://namenode1:7077
# ./run-example org.apache.spark.examples.SparkKMeans spark://namenode1:7077 file:/home/hadoop/spark/kmeans_data.txt 2 1
(3)、结合HDFS的集群模式
# hadoop fs -put README.md .
# MASTER=spark://namenode1:7077 ./spark-shell
scala> val file = sc.textFile("hdfs://namenode1:9000/user/root/README.md")
scala> val count = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_+_)
scala> count.collect()
scala> :quit
(4)、基于YARN模式
# SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-1.6.1-incubating-hadoop2.6.2.jar \
./spark-class org.apache.spark.deploy.yarn.Client \
--jar examples/target/scala-2.9.3/spark-examples-assembly-1.6.1-incubating.jar \
--class org.apache.spark.examples.SparkPi \
--args yarn-standalone \
--num-workers 3 \
--master-memory 4g \
--worker-memory 2g \
--worker-cores 1
执行结果:
/usr/local/hadoop/logs/userlogs/application_*/container*_000001/stdout
(5)、其他一些样例程序
examples/src/main/scala/org/apache/spark/examples/
(6)、问题定位【数据节点上的日志】
/home/hadoop/hadoop/logs
(7)、一些优化
# vim /home/hadoop/spark/conf/spark-env.sh
export SPARK_WORKER_MEMORY=16g 【根据内存大小进行实际配置】
......
(8)、最终的目录结构
一、基础环境
==================================================================================
1、服务器分布
192.168.10.84 主名字节点
192.168.10.84 备名字节点
192.168.10.83 数据节点1
192.168.10.85 数据节点2
----------------------------------------------------------------------------------
2、HOSTS 设置
在每台服务器的“/etc/hosts”文件
192.168.10.83 192-168-10-83
192.168.10.84 192-168-10-84
192.168.10.85 192-168-10-85
----------------------------------------------------------------------------------
3、SSH 免密码登录
可参考文章:
http://blog.csdn.net/codepeak/article/details/14447627
==================================================================================
二、Hadoop 2.6.2 编译安装【官方提供的二进制版本native java为32位版本,64位环境需重新编译】
==================================================================================
1、JDK 安装
http://download.oracle.com/otn-pub/java/jdk/7u45-b18/jdk-7u45-linux-x64.tar.gz
# tar xvzf jdk-7u45-linux-x64.tar.gz -C /usr/local
# cd /usr/local
# ln -s jdk1.7.0_45 jdk
# vim /etc/profile
export JAVA_HOME=/usr/local/java
export CLASS_PATH=$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export PATH=$PATH:$JAVA_HOME/bin
# source /etc/profile
----------------------------------------------------------------------------------
2、MAVEN 安装
http://mirror.bit.edu.cn/apache/maven/maven-3/3.1.1/binaries/apache-maven-3.1.1-bin.tar.gz
# tar xvzf apache-maven-3.1.1-bin.tar.gz -C /usr/local
# cd /usr/local
# ln -s apache-maven-3.1.1 maven
# vim /etc/profile
export MAVEN_HOME=/usr/local/maven
export PATH=$PATH:$MAVEN_HOME/bin
# source /etc/profile
# mvn -v
3、PROTOBUF 安装
https://protobuf.googlecode.com/files/protobuf-2.5.0.tar.gz
# tar xvzf protobuf-2.5.0.tar.gz
# ./configure --prefix=/usr/local/protobuf
# make && make install
# vim /etc/profile
export PROTO_HOME=/usr/local/protobuf
export PATH=$PATH:$PROTO_HOME/bin
# source /etc/profile
# vim /etc/ld.so.conf
/usr/local/protobuf/lib
# /sbin/ldconfig
4、其他依赖库安装
http://www.cmake.org/files/v2.8/cmake-2.8.12.1.tar.gz
http://ftp.gnu.org/pub/gnu/ncurses/ncurses-5.9.tar.gz
http://www.openssl.org/source/openssl-1.0.1e.tar.gz
# tar xvzf cmake-2.8.12.1.tar.gz
# cd cmake-2.8.12.1
# ./bootstrap --prefix=/usr/local
# gmake && gmake install
# tar xvzf ncurses-5.9.tar.gz
# cd ncurses-5.9
# ./configure --prefix=/usr/local
# make && make install
# tar xvzf openssl-1.0.1e.tar.gz
# cd openssl-1.0.1e
# ./config shared --prefix=/usr/local
# make && make install
# /sbin/ldconfig
----------------------------------------------------------------------------------
5、编译 Hadoop
http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-2.6.2/hadoop-2.6.2-src.tar.gz
(1)、maven源设置【在<mirrors></mirros>里添加】
# vim /usr/local/maven/conf/settings.xml
<mirror>
<id>nexus-osc</id>
<mirrorOf>*</mirrorOf>
<name>Nexusosc</name>
<url>http://maven.oschina.net/content/groups/public/</url>
</mirror>
(2)、编译Hadoop
# tar xvzf hadoop-2.6.2-src.tar.gz
# cd hadoop-2.6.2-src
# mvn clean install -DskipTests
# mvn package -Pdist,native -DskipTests -Dtar
## 编译成功后,生成的二进制包所在路径
hadoop-dist/target/hadoop-2.6.2
创建hadoop用户
useradd hadoop
# cp -a hadoop-dist/target/hadoop-2.6.2 /home/hadoop/source
# cd /home/hadoop
# ln -s /home/hadoop/source/hadoop-2.6.2 ./hadoop
【注意:编译过程中,可能会失败,需要多尝试几次】
========================================================================================
三、Hadoop YARN 分布式集群配置【注:所有节点都做同样配置】
========================================================================================
1、环境变量设置
# vim /etc/profile 或者 .bash_profile
export HADOOP_DEV_HOME=/home/hadoop/hadoop
export PATH=$PATH:$HADOOP_DEV_HOME/bin
export PATH=$PATH:$HADOOP_DEV_HOME/sbin
export HADOOP_MAPARED_HOME=${HADOOP_DEV_HOME}
export HADOOP_COMMON_HOME=${HADOOP_DEV_HOME}
export HADOOP_HDFS_HOME=${HADOOP_DEV_HOME}
export YARN_HOME=${HADOOP_DEV_HOME}
export HADOOP_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
export HDFS_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
export YARN_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
export SPARK_HOME=/home/hadoop/spark
export PATH=$PATH:$SPARK_HOME/bin
export PATH
# source /etc/profile
3、配置 core-site.xml
# vim /home/hadoop/hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.10.84:9000</value>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>192.168.10.84</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
</configuration>
4、配置 hdfs-site.xml
# vim /home/hadoop/hadoop/etc/hadoop/hdfs-site.xml
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/hadoop/hdfs/name</value>
<final>true</final>
</property>
<property>
<name>dfs.federation.nameservice.id</name>
<value>ns1</value>
</property>
<property>
<name>dfs.namenode.backup.address.ns1</name>
<value>192.168.10.84:50100</value>
</property>
<property>
<name>dfs.namenode.backup.http-address.ns1</name>
<value>192.168.10.84:50105</value>
</property>
<property>
<name>dfs.federation.nameservices</name>
<value>ns1</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns1</name>
<value>192.168.10.84:9000</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns2</name>
<value>192.168.10.84:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.ns1</name>
<value>192.168.10.84:23001</value>
</property>
<property>
<name>dfs.namenode.http-address.ns2</name>
<value>192.168.10.84:13001</value>
</property>
<property>
<name>dfs.dataname.data.dir</name>
<value>file:/hadoop/hdfs/data</value>
<final>true</final>
</property>
<property>
<name>dfs.namenode.secondary.http-address.ns1</name>
<value>192.168.10.84:23002</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address.ns2</name>
<value>192.168.10.84:23002</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address.ns1</name>
<value>192.168.10.84:23003</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address.ns2</name>
<value>192.168.10.84:23003</value>
</property>
------------------------------------------------------------------------------------------------------------------------------------------
6、配置 yarn-site.xml
# vim /home/hadoop/hadoop/etc/hadoop/yarn-site.xml
<property>
<name>yarn.resourcemanager.address</name>
<value>192.168.10.84:18040</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>192.168.10.84:18030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>192.168.10.84:18088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>192.168.10.84:18025</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>192.168.10.84:18141</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
7、配置 hadoop-env.sh、mapred-env.sh、yarn-env.sh【在开头添加】
文件路径:
/home/hadoop/hadoop/etc/hadoop/hadoop-env.sh
/home/hadoop/hadoop/etc/hadoop/mapred-env.sh
/home/hadoop/hadoop/etc/hadoop/yarn-env.sh
添加内容:
export JAVA_HOME=/usr/local/java
export CLASS_PATH=$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_PID_DIR=/home/hadoop/data/hadoop/pids
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_PREFIX=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HDFS_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
------------------------------------------------------------------------------------------------------------------------------------------
8、数据节点配置
# vim /home/hadoop/hadoop/etc/hadoop/slaves
192.168.10.83
192.168.10.84
192.168.10.85
------------------------------------------------------------------------------------------------------------------------------------------
9、Hadoop 简单测试
# cd /home/hadoop/hadoop
## 首次启动集群时,做如下操作【主名字节点上执行】
# hdfs namenode -format
# sbin/start-dfs.sh
## 检查进程是否正常启动
# jps
主名字节点:
NodeManager
备名字节点:
SecondaryNameNode
数据节点:
DataNode
## hdfs与mapreduce测试
# hdfs dfs -mkdir -p /user/test
# hdfs dfs -put bin/hdfs.cmd /user/test
# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /user/test /user/out
# hdfs dfs -ls /user/out
## hdfs信息查看
# hdfs dfsadmin -report
# hdfs fsck / -files -blocks
## 集群的后续维护
# sbin/start-all.sh
# sbin/stop-all.sh
## 监控页面URL
http://192.168.10.84:80/
========================================================================================
四、Spark 分布式集群配置【注:所有节点都做同样配置】
========================================================================================
1、Scala 安装
http://www.scala-lang.org/files/archive/scala-2.9.3.tgz
# tar xvzf scala-2.9.3.tgz -C /usr/local
# cd /usr/local
# ln -s scala-2.9.3 scala
# vim /etc/profile
export SCALA_HOME=/usr/local/scala
export PATH=$PATH:$SCALA_HOME/bin
export SPARK_CLASSPATH=/home/hadoop/spark/lib
# source /etc/profile
------------------------------------------------------------------------------------------------------------------------------------------
2、Spark 安装
http://d3kbcqa49mib13.cloudfront.net/spark-1.6.1-incubating-bin-hadoop2.6tgz
# tar xvzf spark-1.6.1-incubating-bin-hadoop2.6.tgz -C /home/hadoop/source
# cd /home/hadoop
# ln -s spark-1.6.1-incubating-bin-hadoop2.6 ./spark
# vim /etc/profile
export SPARK_HOME=/home/hadoop/spark
export PATH=$PATH:$SPARK_HOME/bin
# source /etc/profile
# cd /home/hadoop/spark/conf
# mv spark-env.sh.template spark-env.sh
# vim spark-env.sh
export JAVA_HOME=/usr/local/java
export SCALA_HOME=/usr/local/scala
export HADOOP_HOME=/home/hadoop/hadoop
## worker节点的主机名列表
# vim slaves
192.168.10.83
192.168.10.84
192.168.10.85
# mv log4j.properties.template log4j.properties
## 在Master节点上执行
# .sbin/start-all.sh
## 检查进程是否启动【在master节点上出现“Master”,在slave节点上出现“Worker”】
# jps
Master节点:
Master
Slave节点:
worker
3、相关测试
## 监控页面URL
http://192.168.10.84:8080/
## 先切换到“/home/hadoop/spark”目录
(1)、本地模式
# ./run-example org.apache.spark.examples.SparkPi local
(2)、普通集群模式
# ./run-example org.apache.spark.examples.SparkPi spark://namenode1:7077
# ./run-example org.apache.spark.examples.SparkLR spark://namenode1:7077
# ./run-example org.apache.spark.examples.SparkKMeans spark://namenode1:7077 file:/home/hadoop/spark/kmeans_data.txt 2 1
(3)、结合HDFS的集群模式
# hadoop fs -put README.md .
# MASTER=spark://namenode1:7077 ./spark-shell
scala> val file = sc.textFile("hdfs://namenode1:9000/user/root/README.md")
scala> val count = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_+_)
scala> count.collect()
scala> :quit
(4)、基于YARN模式
# SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-1.6.1-incubating-hadoop2.6.2.jar \
./spark-class org.apache.spark.deploy.yarn.Client \
--jar examples/target/scala-2.9.3/spark-examples-assembly-1.6.1-incubating.jar \
--class org.apache.spark.examples.SparkPi \
--args yarn-standalone \
--num-workers 3 \
--master-memory 4g \
--worker-memory 2g \
--worker-cores 1
执行结果:
/usr/local/hadoop/logs/userlogs/application_*/container*_000001/stdout
(5)、其他一些样例程序
examples/src/main/scala/org/apache/spark/examples/
(6)、问题定位【数据节点上的日志】
/home/hadoop/hadoop/logs
(7)、一些优化
# vim /home/hadoop/spark/conf/spark-env.sh
export SPARK_WORKER_MEMORY=16g 【根据内存大小进行实际配置】
......
(8)、最终的目录结构
- Hadoop2.4.0重编译64位本地库.pdf (1.6 MB)
- 下载次数: 1
评论
1 楼
ximeng1234
2016-03-22
spark spark-1.6.1-bin-hadoop2.6中的run-example spark-examples-1.6.1-hadoop2.6.0.jar
文件有问题执行会有如下错误:
java.lang.NoSuchMethodError: org.apache.hadoop.conf.Configuration.addDeprecation
原因是这个jar文件里的configuration.class没有和2.6版本同步
文件有问题执行会有如下错误:
java.lang.NoSuchMethodError: org.apache.hadoop.conf.Configuration.addDeprecation
原因是这个jar文件里的configuration.class没有和2.6版本同步
发表评论
-
Spark RDD API详解(一) Map和Reduce
2016-03-31 11:52 685转自:https://www.zybuluo.com/jewe ... -
Scala基本语法和概念
2016-03-29 11:00 1030转自:http://blog.javachen.com/201 ... -
nginx日志配置指令详解
2016-03-28 17:55 735nginx有一个非常灵活的日志记录模式。每个级别的配置可以有各 ... -
Spark sbt/sbt assembly编译
2016-03-18 13:57 9611、安装JDK 2、安装sbt 3、安装git ... -
Hadoop-2.6.0+Zookeeper-3.4.6+Spark-1.3.1+Hbase-1.1.0+Hive-1.2.0集群搭建
2015-05-26 14:27 3431文章来自:http://blog.cs ... -
beeline 连接SPARK /Hive
2015-05-06 11:13 8052hiveclient所在主机的jdk 1.7_51,hi ...
相关推荐
Spark on YARN 集群搭建详细过程 _title_:Spark on YARN 集群搭建详细过程 _description_:本文详细介绍了 Spark on YARN 集群搭建的过程,包括配置主机 hosts 文件、免密码登录、安装 Java、安装 Scala 等步骤。...
Spark on Yan集群搭建的详细过程,减少集群搭建的时间
总结来说,Spark on Yarn的安装部署涉及到多个环节,包括环境配置、资源管理器的设置、集群启动以及应用的提交和监控。每个步骤都需要仔细操作,以确保Spark能够有效地在Hadoop集群上运行。通过这个实验,不仅可以...
本文档是 Spark on Yarn 模式部署的详细指南,涵盖了从环境准备到 Spark 配置的每个步骤,并提供了详细的配置信息和命令。 标签解释 Spark 配置是指在 Spark 应用程序中配置各种参数和环境变量,以便 Spark 能够...
Spark on YARN 集群搭建是一个复杂的过程,涉及到多台服务器的配置和软件的安装。以下是详细步骤和相关知识点: 1. **主机配置与网络通信** - `/etc/hosts` 文件配置至关重要,它用于解析主机名到IP地址的映射。...
搭建 Spark On Yarn 集群主要涉及三个组件的安装和配置:Zookeeper、Hadoop 和 Spark。下面将详细介绍这三个阶段的搭建过程。 一、Zookeeper 集群搭建 Zookeeper 是一个分布式的、开放源码的分布式应用程序协调服务...
基于docker搭建spark on yarn及可视化桌面.doc
### 基于CDH 6.3.0 搭建 Hive on Spark 及相关配置和调优 #### 概述 随着大数据技术的发展,Hadoop生态系统不断成熟与完善,其中Apache Hive作为数据仓库工具,支持通过SQL语句进行查询、分析存储在Hadoop文件系统...
Java提交Spark任务到YARN...以上就是Java提交Spark任务到YARN平台的核心知识点,涵盖了从环境搭建、代码编写、任务提交到资源管理和性能优化的全过程。掌握这些要点,可以有效地在大规模集群上运行和管理Spark作业。
本篇博客,Alice为大家带来关于如何搭建Spark的on yarn集群模式的教程。 文章目录准备工作cluster模式client模式[了解]两种模式的区别 官方文档: http://spark.apache.org/docs/latest/running-on-yarn.html 准备...
到2013年8月已经成功搭建了200台Yarn集群,运行Spark 0.8版本。 - 目前,阿里云梯1已经达到了5000*2的规模,使用的是Yarn 0.23.7版本,充分展示了Spark_on_Yarn在处理大规模数据集方面的强大能力。 #### 四、基于...
通过使用 Docker,可以快速的在本地搭建一套 Spark 环境,方便大家开发 Spark 应用,或者扩展到生产环境。下面这篇文章主要给大家介绍了使用docker快速搭建Spark集群的方法教程,需要的朋友可以参考借鉴,下面来一起...
CentOS集群搭建、Hadoop集群搭建 配置免密 连接外网,Hive安装 Zookeeper搭建 Kafka scala flume安装 Spark搭建及启动
本文主要是学习大数据的常用工具框架,搭建Hadoop3.4.0 + Spark3.5.1 on Yarn的集群环境,本集群用到4台虚拟机(物理机也可以),1主3从。 实验环境:VMWare WorkStation + CentOS8.5 + JDK17 + Hadoop3.4.0 + Spark...
2. **Python环境配置**:如果打算使用PySpark(Python on Spark),则需要确保安装了JDK 1.6.x版本。 3. **开发工具**:建议使用IntelliJ IDEA作为主要的开发工具,并且可以使用IDEA 13作为示例来进行操作。 4. **...
│ 06-[理解]-Spark环境搭建-On-Yarn-两种模式.mp4 │ 07-[掌握]-Spark环境搭建-On-Yarn-两种模式演示.mp4 │ 09-[掌握]-Spark代码开发-准备工作.mp4 │ 10-[重点]-Spark代码开发-入门案例.mp4 ├─Spark-day02 ...
Spark-Core文档是本人经三年总结笔记汇总而来,对于自我学习Spark核心基础知识非常方便,资料中例举完善,内容丰富。具体目录如下: 目录 第一章 Spark简介与计算模型 3 ...2.2 Spark on YARN运行过程 60
docker
在安装方面,该文档介绍了如何在Yarn集群上搭建Spark,包括下载Spark安装包、配置spark-env.sh、slaves文件等步骤,并分发到slave1/2等节点上,确保了分布式集群环境下的配置过程。 启动和验证是检查Spark是否正确...