- 浏览: 2542373 次
- 性别:
- 来自: 成都
文章分类
最新评论
-
nation:
你好,在部署Mesos+Spark的运行环境时,出现一个现象, ...
Spark(4)Deal with Mesos -
sillycat:
AMAZON Relatedhttps://www.godad ...
AMAZON API Gateway(2)Client Side SSL with NGINX -
sillycat:
sudo usermod -aG docker ec2-use ...
Docker and VirtualBox(1)Set up Shared Disk for Virtual Box -
sillycat:
Every Half an Hour30 * * * * /u ...
Build Home NAS(3)Data Redundancy -
sillycat:
3 List the Cron Job I Have>c ...
Build Home NAS(3)Data Redundancy
Data Solution 2019(10)Spark Cluster Solution with Zeppelin
Spark Single Cluster
https://spark.apache.org/docs/latest/spark-standalone.html
Mesos Cluster
https://spark.apache.org/docs/latest/running-on-mesos.html
Hadoop2 YARN
https://spark.apache.org/docs/latest/running-on-yarn.html
K8S
https://spark.apache.org/docs/latest/running-on-kubernetes.html
Zeppelin with Cluster
https://zeppelin.apache.org/docs/latest/interpreter/spark.html
Decide to Set Up Spark Standalone Cluster and Zeppelin
Start the Spark Master Machine
Prepare Spark
> wget http://apache.mirrors.ionfish.org/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
> tar zxvf spark-2.4.4-bin-hadoop2.7.tgz
> mv spark-2.4.4-bin-hadoop2.7 ~/tool/spark-2.4.4
> sudo ln -s /home/carl/tool/spark-2.4.4 /opt/spark-2.4.4
> sudo ln -s /opt/spark-2.4.4 /opt/spark
> cd /opt/spark
> cp conf/spark-env.sh.template conf/spark-env.sh
A lot of sample configuration there
# Options for the daemons used in the standalone deploy mode
# - SPARK_MASTER_HOST, to bind the master to a different IP address or hostname
# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master
# - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y")
# - SPARK_WORKER_CORES, to set the number of cores to use on this machine
# - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)
# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports for the worker
# - SPARK_WORKER_DIR, to set the working directory of worker processes
# - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. "-Dx=y")
# - SPARK_DAEMON_MEMORY, to allocate to the master, worker and history server themselves (default: 1g).
# - SPARK_HISTORY_OPTS, to set config properties only for the history server (e.g. "-Dx=y")
# - SPARK_SHUFFLE_OPTS, to set config properties only for the external shuffle service (e.g. "-Dx=y")
# - SPARK_DAEMON_JAVA_OPTS, to set config properties for all daemons (e.g. "-Dx=y")
# - SPARK_DAEMON_CLASSPATH, to set the classpath for all daemons
# - SPARK_PUBLIC_DNS, to set the public dns name of the master or workers
https://spark.apache.org/docs/latest/spark-standalone.html
Make some changes according to my ENV
> vi conf/spark-env.sh
SPARK_MASTER_HOST=rancher-home
SPARK_MASTER_PORT=7077
SPARK_MASTER_WEBUI_PORT=8088
SPARK_WORKER_PORT=7177
SPARK_WORKER_WEBUI_PORT=8188
Start the master service
> sbin/start-master.sh
Start the Slave on rancher-worker1
> wget http://apache.mirrors.ionfish.org/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
> tar zxvf spark-2.4.4-bin-hadoop2.7.tgz
> mv spark-2.4.4-bin-hadoop2.7 ~/tool/spark-2.4.4
> sudo ln -s /home/carl/tool/spark-2.4.4 /opt/spark-2.4.4
> sudo ln -s /opt/spark-2.4.4 /opt/spark
Prepare Configuration
> cp conf/spark-env.sh.template conf/spark-env.sh
SPARK_MASTER_HOST=rancher-home
SPARK_MASTER_PORT=7077
SPARK_MASTER_WEBUI_PORT=8088
SPARK_WORKER_PORT=7177
SPARK_WORKER_WEBUI_PORT=8188
Start the slave and connect to master
> sbin/start-slave.sh spark://rancher-home:7077
Stop the slave
> sbin/stop-slave.sh spark://rancher-home:7077
Make Spark Cluster in Docker
# - SPARK_NO_DAEMONIZE Run the proposed command in the foreground. It will not output a PID file.
SPARK_NO_DAEMONIZE=true
It fails if I start the services
2019-10-28T00:41:42.502359700Z 19/10/28 00:41:42 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
2019-10-28T00:41:43.110823900Z 19/10/28 00:41:43 WARN Utils: Service 'sparkMaster' could not bind on port 7077. Attempting port 7078.
HOST file
https://cloud.tencent.com/developer/article/1175087
Finally, the configuration will be close to these for Master
SPARK_MASTER_PORT=7077
SPARK_MASTER_WEBUI_PORT=8088
SPARK_LOCAL_HOSTNAME=rancher-home
SPARK_IDENT_STRING=rancher-home
SPARK_PUBLIC_DNS=rancher-home
SPARK_NO_DAEMONIZE=true
SPARK_DAEMON_MEMORY=1g
Dockerfile as follow:
#Set up spark master in Docker
#Prepre the OS
FROM centos:7
MAINTAINER Yiyi Kang <yiyikangrachel@gmail.com>
RUN yum -y update
RUN yum install -y wget
#install jdk
RUN yum -y install java-1.8.0-openjdk.x86_64
RUN echo ‘export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk’ | tee -a /etc/profile
RUN mkdir /tool/
WORKDIR /tool/
#add the software spark
RUN wget --no-verbose http://apache.mirrors.ionfish.org/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
RUN tar -xvzf spark-2.4.4-bin-hadoop2.7.tgz
RUN ln -s /tool/spark-2.4.4-bin-hadoop2.7 /tool/spark
ADD conf/spark-env.sh /tool/spark/conf/
#set up the app
EXPOSE 8088 7077
RUN mkdir -p /app/
ADD start.sh /app/
WORKDIR /app/
CMD [ "./start.sh" ]
Makefile important parts as follow:
run:
docker run -d -p 7077:7077 -p 8088:8088 \
--hostname rancher-home \
--name $(NAME) $(IMAGE):$(TAG)
The Slave Machine Configuration will be as follow:
SPARK_WORKER_PORT=7177
SPARK_WORKER_WEBUI_PORT=8188
SPARK_PUBLIC_DNS=rancher-worker1
SPARK_LOCAL_HOSTNAME=rancher-worker1
SPARK_IDENT_STRING=rancher-worker1
SPARK_NO_DAEMONIZE=true
Dockerfile is as follow:
#Set up spark slave in Docker
#Prepre the OS
FROM centos:7
MAINTAINER Yiyi Kang <yiyikangrachel@gmail.com>
RUN yum -y update
RUN yum install -y wget
#install jdk
RUN yum -y install java-1.8.0-openjdk.x86_64
RUN echo ‘export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk’ | tee -a /etc/profile
RUN mkdir /tool/
WORKDIR /tool/
#add the software spark
RUN wget --no-verbose http://apache.mirrors.ionfish.org/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
RUN tar -xvzf spark-2.4.4-bin-hadoop2.7.tgz
RUN ln -s /tool/spark-2.4.4-bin-hadoop2.7 /tool/spark
ADD conf/spark-env.sh /tool/spark/conf/
#set up the app
EXPOSE 8188 7177
RUN mkdir -p /app/
ADD start.sh /app/
WORKDIR /app/
CMD [ "./start.sh" ]
Add host to point to our master machine
run:
docker run -d -p 7177:7177 -p 8188:8188 \
--name $(NAME) \
--hostname rancher-worker1 \
--add-host=rancher-home:192.168.56.110 $(IMAGE):$(TAG)
Next step is to put a lot of configuration in parameters.
References:
https://spark.apache.org/docs/latest/cluster-overview.html
https://stackoverflow.com/questions/28664834/which-cluster-type-should-i-choose-for-spark
https://stackoverflow.com/questions/39671117/docker-container-with-apache-spark-in-standalone-cluster-mode
https://github.com/shuaicj/docker-spark-master
https://stackoverflow.com/questions/32719007/spark-spark-public-dns-and-spark-local-ip-on-stand-alone-cluster-with-docker-con
Spark Single Cluster
https://spark.apache.org/docs/latest/spark-standalone.html
Mesos Cluster
https://spark.apache.org/docs/latest/running-on-mesos.html
Hadoop2 YARN
https://spark.apache.org/docs/latest/running-on-yarn.html
K8S
https://spark.apache.org/docs/latest/running-on-kubernetes.html
Zeppelin with Cluster
https://zeppelin.apache.org/docs/latest/interpreter/spark.html
Decide to Set Up Spark Standalone Cluster and Zeppelin
Start the Spark Master Machine
Prepare Spark
> wget http://apache.mirrors.ionfish.org/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
> tar zxvf spark-2.4.4-bin-hadoop2.7.tgz
> mv spark-2.4.4-bin-hadoop2.7 ~/tool/spark-2.4.4
> sudo ln -s /home/carl/tool/spark-2.4.4 /opt/spark-2.4.4
> sudo ln -s /opt/spark-2.4.4 /opt/spark
> cd /opt/spark
> cp conf/spark-env.sh.template conf/spark-env.sh
A lot of sample configuration there
# Options for the daemons used in the standalone deploy mode
# - SPARK_MASTER_HOST, to bind the master to a different IP address or hostname
# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master
# - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y")
# - SPARK_WORKER_CORES, to set the number of cores to use on this machine
# - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)
# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports for the worker
# - SPARK_WORKER_DIR, to set the working directory of worker processes
# - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. "-Dx=y")
# - SPARK_DAEMON_MEMORY, to allocate to the master, worker and history server themselves (default: 1g).
# - SPARK_HISTORY_OPTS, to set config properties only for the history server (e.g. "-Dx=y")
# - SPARK_SHUFFLE_OPTS, to set config properties only for the external shuffle service (e.g. "-Dx=y")
# - SPARK_DAEMON_JAVA_OPTS, to set config properties for all daemons (e.g. "-Dx=y")
# - SPARK_DAEMON_CLASSPATH, to set the classpath for all daemons
# - SPARK_PUBLIC_DNS, to set the public dns name of the master or workers
https://spark.apache.org/docs/latest/spark-standalone.html
Make some changes according to my ENV
> vi conf/spark-env.sh
SPARK_MASTER_HOST=rancher-home
SPARK_MASTER_PORT=7077
SPARK_MASTER_WEBUI_PORT=8088
SPARK_WORKER_PORT=7177
SPARK_WORKER_WEBUI_PORT=8188
Start the master service
> sbin/start-master.sh
Start the Slave on rancher-worker1
> wget http://apache.mirrors.ionfish.org/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
> tar zxvf spark-2.4.4-bin-hadoop2.7.tgz
> mv spark-2.4.4-bin-hadoop2.7 ~/tool/spark-2.4.4
> sudo ln -s /home/carl/tool/spark-2.4.4 /opt/spark-2.4.4
> sudo ln -s /opt/spark-2.4.4 /opt/spark
Prepare Configuration
> cp conf/spark-env.sh.template conf/spark-env.sh
SPARK_MASTER_HOST=rancher-home
SPARK_MASTER_PORT=7077
SPARK_MASTER_WEBUI_PORT=8088
SPARK_WORKER_PORT=7177
SPARK_WORKER_WEBUI_PORT=8188
Start the slave and connect to master
> sbin/start-slave.sh spark://rancher-home:7077
Stop the slave
> sbin/stop-slave.sh spark://rancher-home:7077
Make Spark Cluster in Docker
# - SPARK_NO_DAEMONIZE Run the proposed command in the foreground. It will not output a PID file.
SPARK_NO_DAEMONIZE=true
It fails if I start the services
2019-10-28T00:41:42.502359700Z 19/10/28 00:41:42 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
2019-10-28T00:41:43.110823900Z 19/10/28 00:41:43 WARN Utils: Service 'sparkMaster' could not bind on port 7077. Attempting port 7078.
HOST file
https://cloud.tencent.com/developer/article/1175087
Finally, the configuration will be close to these for Master
SPARK_MASTER_PORT=7077
SPARK_MASTER_WEBUI_PORT=8088
SPARK_LOCAL_HOSTNAME=rancher-home
SPARK_IDENT_STRING=rancher-home
SPARK_PUBLIC_DNS=rancher-home
SPARK_NO_DAEMONIZE=true
SPARK_DAEMON_MEMORY=1g
Dockerfile as follow:
#Set up spark master in Docker
#Prepre the OS
FROM centos:7
MAINTAINER Yiyi Kang <yiyikangrachel@gmail.com>
RUN yum -y update
RUN yum install -y wget
#install jdk
RUN yum -y install java-1.8.0-openjdk.x86_64
RUN echo ‘export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk’ | tee -a /etc/profile
RUN mkdir /tool/
WORKDIR /tool/
#add the software spark
RUN wget --no-verbose http://apache.mirrors.ionfish.org/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
RUN tar -xvzf spark-2.4.4-bin-hadoop2.7.tgz
RUN ln -s /tool/spark-2.4.4-bin-hadoop2.7 /tool/spark
ADD conf/spark-env.sh /tool/spark/conf/
#set up the app
EXPOSE 8088 7077
RUN mkdir -p /app/
ADD start.sh /app/
WORKDIR /app/
CMD [ "./start.sh" ]
Makefile important parts as follow:
run:
docker run -d -p 7077:7077 -p 8088:8088 \
--hostname rancher-home \
--name $(NAME) $(IMAGE):$(TAG)
The Slave Machine Configuration will be as follow:
SPARK_WORKER_PORT=7177
SPARK_WORKER_WEBUI_PORT=8188
SPARK_PUBLIC_DNS=rancher-worker1
SPARK_LOCAL_HOSTNAME=rancher-worker1
SPARK_IDENT_STRING=rancher-worker1
SPARK_NO_DAEMONIZE=true
Dockerfile is as follow:
#Set up spark slave in Docker
#Prepre the OS
FROM centos:7
MAINTAINER Yiyi Kang <yiyikangrachel@gmail.com>
RUN yum -y update
RUN yum install -y wget
#install jdk
RUN yum -y install java-1.8.0-openjdk.x86_64
RUN echo ‘export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk’ | tee -a /etc/profile
RUN mkdir /tool/
WORKDIR /tool/
#add the software spark
RUN wget --no-verbose http://apache.mirrors.ionfish.org/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
RUN tar -xvzf spark-2.4.4-bin-hadoop2.7.tgz
RUN ln -s /tool/spark-2.4.4-bin-hadoop2.7 /tool/spark
ADD conf/spark-env.sh /tool/spark/conf/
#set up the app
EXPOSE 8188 7177
RUN mkdir -p /app/
ADD start.sh /app/
WORKDIR /app/
CMD [ "./start.sh" ]
Add host to point to our master machine
run:
docker run -d -p 7177:7177 -p 8188:8188 \
--name $(NAME) \
--hostname rancher-worker1 \
--add-host=rancher-home:192.168.56.110 $(IMAGE):$(TAG)
Next step is to put a lot of configuration in parameters.
References:
https://spark.apache.org/docs/latest/cluster-overview.html
https://stackoverflow.com/questions/28664834/which-cluster-type-should-i-choose-for-spark
https://stackoverflow.com/questions/39671117/docker-container-with-apache-spark-in-standalone-cluster-mode
https://github.com/shuaicj/docker-spark-master
https://stackoverflow.com/questions/32719007/spark-spark-public-dns-and-spark-local-ip-on-stand-alone-cluster-with-docker-con
发表评论
-
Update Site will come soon
2021-06-02 04:10 1672I am still keep notes my tech n ... -
Stop Update Here
2020-04-28 09:00 310I will stop update here, and mo ... -
NodeJS12 and Zlib
2020-04-01 07:44 468NodeJS12 and Zlib It works as ... -
Docker Swarm 2020(2)Docker Swarm and Portainer
2020-03-31 23:18 362Docker Swarm 2020(2)Docker Swar ... -
Docker Swarm 2020(1)Simply Install and Use Swarm
2020-03-31 07:58 364Docker Swarm 2020(1)Simply Inst ... -
Traefik 2020(1)Introduction and Installation
2020-03-29 13:52 330Traefik 2020(1)Introduction and ... -
Portainer 2020(4)Deploy Nginx and Others
2020-03-20 12:06 424Portainer 2020(4)Deploy Nginx a ... -
Private Registry 2020(1)No auth in registry Nginx AUTH for UI
2020-03-18 00:56 428Private Registry 2020(1)No auth ... -
Docker Compose 2020(1)Installation and Basic
2020-03-15 08:10 366Docker Compose 2020(1)Installat ... -
VPN Server 2020(2)Docker on CentOS in Ubuntu
2020-03-02 08:04 445VPN Server 2020(2)Docker on Cen ... -
Buffer in NodeJS 12 and NodeJS 8
2020-02-25 06:43 377Buffer in NodeJS 12 and NodeJS ... -
NodeJS ENV Similar to JENV and PyENV
2020-02-25 05:14 468NodeJS ENV Similar to JENV and ... -
Prometheus HA 2020(3)AlertManager Cluster
2020-02-24 01:47 414Prometheus HA 2020(3)AlertManag ... -
Serverless with NodeJS and TencentCloud 2020(5)CRON and Settings
2020-02-24 01:46 331Serverless with NodeJS and Tenc ... -
GraphQL 2019(3)Connect to MySQL
2020-02-24 01:48 243GraphQL 2019(3)Connect to MySQL ... -
GraphQL 2019(2)GraphQL and Deploy to Tencent Cloud
2020-02-24 01:48 445GraphQL 2019(2)GraphQL and Depl ... -
GraphQL 2019(1)Apollo Basic
2020-02-19 01:36 321GraphQL 2019(1)Apollo Basic Cl ... -
Serverless with NodeJS and TencentCloud 2020(4)Multiple Handlers and Running wit
2020-02-19 01:19 307Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(3)Build Tree and Traverse Tree
2020-02-19 01:19 310Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(2)Trigger SCF in SCF
2020-02-19 01:18 286Serverless with NodeJS and Tenc ...
相关推荐
Advanced analytics on your Big Data with latest Apache Spark 2.x About This Book An advanced guide with a combination of instructions and practical examples to extend the most up-to date Spark ...
You will also learn how to develop Spark applications using SparkR and PySpark APIs, interactive data analytics using Zeppelin, and in-memory data processing with Alluxio. By the end of this book, ...
Apache Zeppelin 和 Spark 在数据科学企业应用中的启用 Apache Zeppelin 是一个基于 Web 的交互式笔记本,旨在使数据科学家和数据分析师更方便地处理大数据。 Apache Spark 是一个开源的数据处理引擎,能够高效地...
Structured Spark Streaming as a Service with Hopsworks is a powerful and flexible solution designed to simplify the process of building real-time data processing pipelines. This service leverages ...
Advanced analytics on your Big Data with latest Apache Spark 2.x About This Book An advanced guide with a combination of instructions and practical examples to extend the most up-to date Spark ...
大数据游乐场:通过Docker-compose与Hadoop,Hive,Spark,Zeppelin和Livy集群。 我希望能够轻松地处理各种大数据应用程序,即Amazon EMR中的那些应用程序。 理想情况下,这可以在一个命令中提出和拆除。 这就是这...
"Apache Spark & Apache Zeppelin 安全性概述" 本资源摘要信息主要介绍 Apache Spark 和 Apache Zeppelin 的安全性概述,涵盖安全防护的四大支柱:身份验证、授权、审计和加密。同时,本文还讨论了 Spark 的安全...
spark streaming streaming
【标题】"vagrant-spark-zeppelin" 提供了一个集成环境,用于学习和探索Apache Spark和Apache Zeppelin。这个项目利用Vagrant技术创建了一个虚拟机(VM),在这个虚拟环境中预装了Apache Spark和Apache Zeppelin,...
With this plugin, you can conveniently work with Zeppelin notebooks, run applications with spark-submit, produce and consume messages with Kafka, monitor Spark and Hadoop YARN applications, and work ...
Starting with introductory recipes on utilizing the Breeze and Spark libraries, get to grips withhow to import data from a host of possible sources and how to pre-process numerical, string, and date ...
- **多语言支持**:Zeppelin 支持多种编程语言,包括 Scala、Python、Spark SQL、Hive、Markdown 和 R 等。 - **交互式分析**:用户可以直接在 Notebook 中执行代码并查看结果,非常适合探索性数据分析。 - **可视化...
"藏经阁-Enabling Apache Zeppelin and Spark for Data Science in the Enterprise" Apache Zeppelin 是一个基于 Web 的交互式笔记本环境,支持多种执行平台和语言,旨在使数据科学家更方便地进行大数据科学研究和...
2. **配置环境**:修改conf/zeppelin-env.sh或zeppelin-site.xml文件,配置Hive、Spark等相关连接信息。 3. **启动Zeppelin**:运行bin/zeppelin-daemon.sh start命令启动服务。 4. **访问Web界面**:通过浏览器...
Zeppelin 提供了多种语言的解释器,如 SQL、Spark、Python、R 和 Scala,使得用户可以方便地进行多语言编程,并在同一个环境中无缝切换。这个“zeppelin-0.8.0-bin-all.tgz”压缩包是 Apache Zeppelin 的 0.8.0 版本...
当地的构建映像并在安装了数据量的本地模式下运行docker build -t zeppelin:1.5.0 .mkdir /data && chmod -R 777 /datadocker run -d -v /data:/zeppelin/data -p 8080:8080 -p 8081:8081 zeppelin:1.5.0Zeppelin将...
Install and use DCOS for big data processingUse Apache Spark for big data stack data processing Who This Book Is For Developers, architects, IT project managers, database administrators, and others...