- 浏览: 2539402 次
- 性别:
- 来自: 成都
文章分类
最新评论
-
nation:
你好,在部署Mesos+Spark的运行环境时,出现一个现象, ...
Spark(4)Deal with Mesos -
sillycat:
AMAZON Relatedhttps://www.godad ...
AMAZON API Gateway(2)Client Side SSL with NGINX -
sillycat:
sudo usermod -aG docker ec2-use ...
Docker and VirtualBox(1)Set up Shared Disk for Virtual Box -
sillycat:
Every Half an Hour30 * * * * /u ...
Build Home NAS(3)Data Redundancy -
sillycat:
3 List the Cron Job I Have>c ...
Build Home NAS(3)Data Redundancy
Spark 2017 BigData Update(2)CentOS Cluster
Check ENV as well
>java -version
java version "1.8.0_60"
>mvn --version
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T16:41:47+00:00)
Set up Old version 2.5.0 Protoc
>git clone https://github.com/google/protobuf.git
>git checkout tags/v2.5.0
Follow
http://sillycat.iteye.com/blog/2100276
http://sillycat.iteye.com/blog/2193762
Change the Code in autogen.sh
- curl http://googletest.googlecode.com/files/gtest-1.5.0.tar.bz2 | tar jx
- mv gtest-1.5.0 gtest
+ curl -L https://github.com/google/googletest/archive/release-1.5.0.tar.gz | tar zx
+ mv googletest-release-1.5.0 gtest
>./autogen.sh
>./configure
>make
>sudo make install
>protoc --version
libprotoc 2.5.0
>cmake --version
cmake version 3.10.1
Follow the link here to install that http://sillycat.iteye.com/blog/2405875
Build the Hadoop
>wget http://mirrors.ocf.berkeley.edu/apache/hadoop/common/hadoop-2.7.5/hadoop-2.7.5-src.tar.gz
>mvn package -Pdist,native -DskipTests -Dtar
It successfully builds. The final file will be hadoop-dist/target/hadoop-2.7.5.tar.gz
Place the hadoop in working directory
>sudo ln -s /home/ec2-user/tool/hadoop-2.7.5 /opt/hadoop-2.7.5
Config the 3 nodes SSH to each other.
Something similar to
>ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
>cat ~/.ssh/id_dsa.pub
>vi ~/.ssh/authorized_keys
Add the hadoop to PATH
>vi ~/.profile
PATH="/opt/hadoop/bin:$PATH"
Execute this on all the machines
>hdfs namenode -format
Follow the Setting documents here to set up slaves, pdfs-site.xml and other settings in /opt/hadoop/etc/hadoop
http://sillycat.iteye.com/blog/2288141
Start HDFS
>sbin/start-dfs.sh
Visit Page http://fr-stage-api:50070/dfshealth.html#tab-overview
Start YARN
>sbin/start-yarn.sh
Visit Page http://fr-stage-api:8088/cluster/nodes
Install Spark on the main machine
>wget http://apache.spinellicreations.com/spark/spark-2.2.1/spark-2.2.1-bin-hadoop2.7.tgz
unzip and place in the right directory
>sudo ln -s /home/ec2-user/tool/spark-2.2.1 /opt/spark-2.2.1
>cp conf/spark-env.sh.template conf/spark-env.sh
>cat conf/spark-env.sh
HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop
>echo $SPARK_HOME
/opt/spark
Install Zeppelin on the Remote Center Server
>wget http://apache.mirrors.tds.net/zeppelin/zeppelin-0.7.3/zeppelin-0.7.3-bin-all.tgz
unzip and place in the right directory
>sudo ln -s /home/ec2-user/tool/zeppelin-0.7.3 /opt/zeppelin-0.7.3
>cp conf/zeppelin-env.sh.template conf/zeppelin-env.sh
The content is as follow in that file
export SPARK_HOME="/opt/spark"
export HADOOP_CONF_DIR="/opt/hadoop/etc/hadoop/"
Start the Node Book
>bin/zeppelin-daemon.sh start
Visit Page http://fr-stage-api:8080
Change the Master of Spark in interpreter to ‘yarn’ from 'local'
Choose the first easy tutorial
http://fr-stage-api:8080/#/notebook/2A94M5J1Z
You can see the task here as well
http://fr-stage-api:4040/stages/
But I can see it error out, and go and check the YARN logging
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Diagnostics: Container [pid=9207,containerID=container_1514501181478_0001_01_000001] is running beyond virtual memory limits. Current usage: 309.7 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used. Killing container.
Solution:
This configuration in yarn-site.xml fixed the problem.
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
Restart the YARN system. It works great this time.
References:
http://sillycat.iteye.com/blog/2405875
Check ENV as well
>java -version
java version "1.8.0_60"
>mvn --version
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T16:41:47+00:00)
Set up Old version 2.5.0 Protoc
>git clone https://github.com/google/protobuf.git
>git checkout tags/v2.5.0
Follow
http://sillycat.iteye.com/blog/2100276
http://sillycat.iteye.com/blog/2193762
Change the Code in autogen.sh
- curl http://googletest.googlecode.com/files/gtest-1.5.0.tar.bz2 | tar jx
- mv gtest-1.5.0 gtest
+ curl -L https://github.com/google/googletest/archive/release-1.5.0.tar.gz | tar zx
+ mv googletest-release-1.5.0 gtest
>./autogen.sh
>./configure
>make
>sudo make install
>protoc --version
libprotoc 2.5.0
>cmake --version
cmake version 3.10.1
Follow the link here to install that http://sillycat.iteye.com/blog/2405875
Build the Hadoop
>wget http://mirrors.ocf.berkeley.edu/apache/hadoop/common/hadoop-2.7.5/hadoop-2.7.5-src.tar.gz
>mvn package -Pdist,native -DskipTests -Dtar
It successfully builds. The final file will be hadoop-dist/target/hadoop-2.7.5.tar.gz
Place the hadoop in working directory
>sudo ln -s /home/ec2-user/tool/hadoop-2.7.5 /opt/hadoop-2.7.5
Config the 3 nodes SSH to each other.
Something similar to
>ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
>cat ~/.ssh/id_dsa.pub
>vi ~/.ssh/authorized_keys
Add the hadoop to PATH
>vi ~/.profile
PATH="/opt/hadoop/bin:$PATH"
Execute this on all the machines
>hdfs namenode -format
Follow the Setting documents here to set up slaves, pdfs-site.xml and other settings in /opt/hadoop/etc/hadoop
http://sillycat.iteye.com/blog/2288141
Start HDFS
>sbin/start-dfs.sh
Visit Page http://fr-stage-api:50070/dfshealth.html#tab-overview
Start YARN
>sbin/start-yarn.sh
Visit Page http://fr-stage-api:8088/cluster/nodes
Install Spark on the main machine
>wget http://apache.spinellicreations.com/spark/spark-2.2.1/spark-2.2.1-bin-hadoop2.7.tgz
unzip and place in the right directory
>sudo ln -s /home/ec2-user/tool/spark-2.2.1 /opt/spark-2.2.1
>cp conf/spark-env.sh.template conf/spark-env.sh
>cat conf/spark-env.sh
HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop
>echo $SPARK_HOME
/opt/spark
Install Zeppelin on the Remote Center Server
>wget http://apache.mirrors.tds.net/zeppelin/zeppelin-0.7.3/zeppelin-0.7.3-bin-all.tgz
unzip and place in the right directory
>sudo ln -s /home/ec2-user/tool/zeppelin-0.7.3 /opt/zeppelin-0.7.3
>cp conf/zeppelin-env.sh.template conf/zeppelin-env.sh
The content is as follow in that file
export SPARK_HOME="/opt/spark"
export HADOOP_CONF_DIR="/opt/hadoop/etc/hadoop/"
Start the Node Book
>bin/zeppelin-daemon.sh start
Visit Page http://fr-stage-api:8080
Change the Master of Spark in interpreter to ‘yarn’ from 'local
Choose the first easy tutorial
http://fr-stage-api:8080/#/notebook/2A94M5J1Z
You can see the task here as well
http://fr-stage-api:4040/stages/
But I can see it error out, and go and check the YARN logging
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Diagnostics: Container [pid=9207,containerID=container_1514501181478_0001_01_000001] is running beyond virtual memory limits. Current usage: 309.7 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used. Killing container.
Solution:
This configuration in yarn-site.xml fixed the problem.
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
Restart the YARN system. It works great this time.
References:
http://sillycat.iteye.com/blog/2405875
发表评论
-
Stop Update Here
2020-04-28 09:00 310I will stop update here, and mo ... -
NodeJS12 and Zlib
2020-04-01 07:44 465NodeJS12 and Zlib It works as ... -
Docker Swarm 2020(2)Docker Swarm and Portainer
2020-03-31 23:18 361Docker Swarm 2020(2)Docker Swar ... -
Docker Swarm 2020(1)Simply Install and Use Swarm
2020-03-31 07:58 363Docker Swarm 2020(1)Simply Inst ... -
Traefik 2020(1)Introduction and Installation
2020-03-29 13:52 328Traefik 2020(1)Introduction and ... -
Portainer 2020(4)Deploy Nginx and Others
2020-03-20 12:06 419Portainer 2020(4)Deploy Nginx a ... -
Private Registry 2020(1)No auth in registry Nginx AUTH for UI
2020-03-18 00:56 428Private Registry 2020(1)No auth ... -
Docker Compose 2020(1)Installation and Basic
2020-03-15 08:10 364Docker Compose 2020(1)Installat ... -
VPN Server 2020(2)Docker on CentOS in Ubuntu
2020-03-02 08:04 444VPN Server 2020(2)Docker on Cen ... -
Buffer in NodeJS 12 and NodeJS 8
2020-02-25 06:43 376Buffer in NodeJS 12 and NodeJS ... -
NodeJS ENV Similar to JENV and PyENV
2020-02-25 05:14 464NodeJS ENV Similar to JENV and ... -
Prometheus HA 2020(3)AlertManager Cluster
2020-02-24 01:47 413Prometheus HA 2020(3)AlertManag ... -
Serverless with NodeJS and TencentCloud 2020(5)CRON and Settings
2020-02-24 01:46 330Serverless with NodeJS and Tenc ... -
GraphQL 2019(3)Connect to MySQL
2020-02-24 01:48 242GraphQL 2019(3)Connect to MySQL ... -
GraphQL 2019(2)GraphQL and Deploy to Tencent Cloud
2020-02-24 01:48 443GraphQL 2019(2)GraphQL and Depl ... -
GraphQL 2019(1)Apollo Basic
2020-02-19 01:36 320GraphQL 2019(1)Apollo Basic Cl ... -
Serverless with NodeJS and TencentCloud 2020(4)Multiple Handlers and Running wit
2020-02-19 01:19 306Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(3)Build Tree and Traverse Tree
2020-02-19 01:19 310Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(2)Trigger SCF in SCF
2020-02-19 01:18 284Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(1)Running with Component
2020-02-19 01:17 302Serverless with NodeJS and Tenc ...
相关推荐
而Oracle GoldenGate与BigData的集成,旨在支持大数据平台的数据集成,例如Hadoop和Spark等。 1. **OGG组件**: OGG主要由以下几个组件构成: - Extract (EXTRACT):从源数据库中抽取数据。 - Pump (PMP):负责...
### Hadoop集群环境下CentOS安装详解 #### 一、前言 随着大数据技术的发展,Hadoop作为处理海量数据的重要工具之一,受到了广泛的关注。本文旨在详细介绍如何在虚拟机环境中搭建Hadoop集群,并重点介绍CentOS操作...
在这个场景中,我们关注的是在CentOS7系统上安装OGG BigData微服务的过程。 首先,"214000_ggs_Linux_x64_BigData_services_shiphome.zip"是OGG BigData服务的安装包,版本号为21.4.0.0.0。这个压缩包包含了在Linux...
This book is aimed at anyone who is interested in big data stacks based on Apache Mesos and Spark. It would be useful to have some basic knowledge of Centos Linux and Scala. But don’t be deterred if ...
### CentOS7 编译 Spark 2.3.v2 生成安装包 #### 一、概述 本文档将详细介绍如何在 CentOS 7 环境下编译 Spark 2.3 版本,并最终生成可安装的包。此过程涉及安装必要的软件环境(如 Java 8、Maven)以及配置 Spark...
### CentOS 下 JDK、Spark 和 Scala 的安装与配置 #### 一、JDK 安装步骤 **1. 确认系统架构** 首先确认 Linux 机器的架构是 32 位还是 64 位,这将决定我们需要下载哪个版本的 JDK。可以通过命令 `getconf LONG_...
### CentOS安装Mysql_Cluster集群知识点详解 #### 一、安装前准备与要求 - **安装环境**: CentOS-6.3。 - **安装方式**: 源码编译安装。 - **软件名称**: mysql-cluster-gpl-7.4.16-linux-glibc2.12-x86_64.tar.gz...
在本文中,我们将详细介绍如何在CentOS 7虚拟机环境下安装DB2数据库。DB2是IBM推出的一款高性能的关系型数据库管理系统,广泛应用于企业级应用中。本文档提供了一个详细的过程,根据作者的亲自试验,来帮助用户完成...
- **Apache Spark**: A fast and general-purpose cluster-computing system for large-scale data processing. The chapter includes practical examples of how to use these tools for data analysis. #### ...
hadoop+spark+hive Linux centos大数据集群搭建,简单易懂,从0到1搭建大数据集群
在CentOS操作系统上,使用RStudio连接到Apache Spark并进行数据处理时,通常会借助`sparklyr`这个R包。`sparklyr`提供了一个R接口,使得用户能够利用R语言与Spark集群进行交互,执行大数据分析任务。本文将详细介绍...
在这个场景中,我们将讨论如何在CENTOS8系统上安装和配置SDL2及其相关的库,包括SDL2_ttf,以便进行开发工作。 首先,我们来看一下提供的压缩包文件: 1. **SDL2-2.28.5.tar.gz**:这是SDL2库的源代码包,版本为...
【MySQL NDB Cluster 8 on CentOS8 部署详解】 MySQL NDB Cluster是一种高可用性和高性能的数据库解决方案,尤其适合需要数据复制和分布式事务处理的场景。在CentOS8上部署MySQL NDB Cluster 8及Mysql Router 8,...
在Vmware的Centos7环境中配置Spark虚拟机涉及到多个步骤,包括系统设置、免密登录、Java环境配置、Hadoop的安装与配置以及Yarn的设置。以下是详细的知识点说明: 1. **系统设置**: - 修改主机名:使用`...
spark hadoop centos7
qcow2类型的centos7的镜像文件
在搭建Hadoop和Spark集群的过程中,首先需要理解这两个框架的基本概念和作用。Hadoop是一个开源的分布式计算框架,由两个主要组件构成:HDFS(Hadoop Distributed File System)和MapReduce。HDFS是一个高容错性的...
百度网盘下载贼慢,centos 6 64位迅雷种子了解一下 1.可以把CentOS理解为Red ... 5.CentOS版本说明:CentOS3.1 等同于 RED HAT AS3 Update1 CentOS3.4 等同于 RED HAT AS3 Update4 CentOS4.0 等同于 RED HAT AS4。
centos7离线安装bzip2
在本文中,我们将深入探讨如何在CentOS 7操作系统中安装aria2-1.19.3,这是一个轻量级的多协议、多源下载工具。Aria2因其高效的下载性能,支持BT、HTTP/HTTPS、FTP、SFTP、BitTorrent等多种协议而广受欢迎。下面我们...