Spark 2017 BigData Update(2)CentOS Cluster

sillycat

浏览: 2564636 次
性别:
来自: 成都

最近访客更多访客>>

huageng520

learnmore

u012363178

ymgjava

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Summary

Spark 2017 BigData Update(2)CentOS Cluster

Check ENV as well
>java -version
java version "1.8.0_60"

>mvn --version
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T16:41:47+00:00)

Set up Old version 2.5.0 Protoc
>git clone https://github.com/google/protobuf.git
>git checkout tags/v2.5.0

Follow
http://sillycat.iteye.com/blog/2100276
http://sillycat.iteye.com/blog/2193762

Change the Code in autogen.sh
- curl http://googletest.googlecode.com/files/gtest-1.5.0.tar.bz2 | tar jx
- mv gtest-1.5.0 gtest
+ curl -L https://github.com/google/googletest/archive/release-1.5.0.tar.gz | tar zx
+ mv googletest-release-1.5.0 gtest

>./autogen.sh
>./configure
>make
>sudo make install
>protoc --version
libprotoc 2.5.0

>cmake --version
cmake version 3.10.1
Follow the link here to install that http://sillycat.iteye.com/blog/2405875

Build the Hadoop
>wget http://mirrors.ocf.berkeley.edu/apache/hadoop/common/hadoop-2.7.5/hadoop-2.7.5-src.tar.gz
>mvn package -Pdist,native -DskipTests -Dtar
It successfully builds. The final file will be hadoop-dist/target/hadoop-2.7.5.tar.gz

Place the hadoop in working directory
>sudo ln -s /home/ec2-user/tool/hadoop-2.7.5 /opt/hadoop-2.7.5
Config the 3 nodes SSH to each other.

Something similar to
>ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
>cat ~/.ssh/id_dsa.pub
>vi ~/.ssh/authorized_keys

Add the hadoop to PATH
>vi ~/.profile
PATH="/opt/hadoop/bin:$PATH"

Execute this on all the machines
>hdfs namenode -format
Follow the Setting documents here to set up slaves, pdfs-site.xml and other settings in /opt/hadoop/etc/hadoop
http://sillycat.iteye.com/blog/2288141

Start HDFS
>sbin/start-dfs.sh

Visit Page http://fr-stage-api:50070/dfshealth.html#tab-overview

Start YARN
>sbin/start-yarn.sh

Visit Page http://fr-stage-api:8088/cluster/nodes

Install Spark on the main machine
>wget http://apache.spinellicreations.com/spark/spark-2.2.1/spark-2.2.1-bin-hadoop2.7.tgz
unzip and place in the right directory
>sudo ln -s /home/ec2-user/tool/spark-2.2.1 /opt/spark-2.2.1
>cp conf/spark-env.sh.template conf/spark-env.sh
>cat conf/spark-env.sh
HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop
>echo $SPARK_HOME
/opt/spark

Install Zeppelin on the Remote Center Server
>wget http://apache.mirrors.tds.net/zeppelin/zeppelin-0.7.3/zeppelin-0.7.3-bin-all.tgz
unzip and place in the right directory
>sudo ln -s /home/ec2-user/tool/zeppelin-0.7.3 /opt/zeppelin-0.7.3
>cp conf/zeppelin-env.sh.template conf/zeppelin-env.sh

The content is as follow in that file
export SPARK_HOME="/opt/spark"
export HADOOP_CONF_DIR="/opt/hadoop/etc/hadoop/"

Start the Node Book
>bin/zeppelin-daemon.sh start

Visit Page http://fr-stage-api:8080
Change the Master of Spark in interpreter to ‘yarn’ from 'local

Choose the first easy tutorial
http://fr-stage-api:8080/#/notebook/2A94M5J1Z

You can see the task here as well
http://fr-stage-api:4040/stages/

But I can see it error out, and go and check the YARN logging
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Diagnostics: Container [pid=9207,containerID=container_1514501181478_0001_01_000001] is running beyond virtual memory limits. Current usage: 309.7 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used. Killing container.

Solution:
This configuration in yarn-site.xml fixed the problem.
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>

Restart the YARN system. It works great this time.

References:
http://sillycat.iteye.com/blog/2405875

分享到：

Spark 2017 BigData Update(3)Notebook Exa ... | ECS and Disk Size Update

2017-12-29 07:10
浏览 714
评论(0)
分类:企业架构
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论