- 浏览: 2542743 次
- 性别:
- 来自: 成都
文章分类
最新评论
-
nation:
你好,在部署Mesos+Spark的运行环境时,出现一个现象, ...
Spark(4)Deal with Mesos -
sillycat:
AMAZON Relatedhttps://www.godad ...
AMAZON API Gateway(2)Client Side SSL with NGINX -
sillycat:
sudo usermod -aG docker ec2-use ...
Docker and VirtualBox(1)Set up Shared Disk for Virtual Box -
sillycat:
Every Half an Hour30 * * * * /u ...
Build Home NAS(3)Data Redundancy -
sillycat:
3 List the Cron Job I Have>c ...
Build Home NAS(3)Data Redundancy
Spark(1)Introduction and Installation
1. Introduction
1.1 MapReduce Model
Map -- read, convert
Reduce -- calculate
4 classes
Read and Convert data to key-value, Map, Reduce, Convert and output key-value to output data.
1.2 Apache mesos
Mesos and YARN, they can control the resource. Resource sharing system.
Hadoop scheduler, MPI scheduler, Spark
Mesos master, Standby master, … (Controlled by ZooKeeper)
Mesos slave, Mesos slave, Mesos slave … (execute Hadoop executor task, MPI executor task, … )
Mesos-master: manage framework and slave, give the resource from slave to framework
Mesos-slave: mesos-task
Framework: Hadoop, Spark …
Executor:
1.3 Spark Introduction
Spark is implemented by Scala and based on Mesos.
It can work with Hadoop and EC2, directly read data from HDFS or S3.
Bagel Shark
Spark(RDD, Map Reduce, FP)
Mesos
HDFS AWS s3n
Spark is using Map Reduce Model, function programming, Mesos, HDFS and S3
Spark Terms
RDD - Resilient Distributed Datasets
Local mode and Mesos Mode
Tansformations and Actions -
Transformation will return RDD,
Action return a collection of scala, value, null
Spark on Mesos
RDD + Job(tasks) ----> SparkScheduler -----> Mesos Master ---> Mesos Slave, Mesos Slave … ( Spark executor… tasks)
1.4 HDFS Introduction
Hadoop Distributed File System ---- NameNode(Only One)------> DataNode
Block 64M, default block of file
NameNode File name, tree, namespace image, edit log, How many blocks does one file have, where is them on the DataNodes.
DataNode Client or NameNode can write and read data from DataNodes
1.5 Zookeeper
Configuration Management
Cluster Management
1.6 NFS Introduction
NFS - Network FileSystem
2. Installation of Spark
After the version 0.6, we can ignore Mesos at first.
Get the source codes
>git clone https://github.com/mesos/spark.git
My scala version is 2.10.0, just try the command
>sudo sbt/sbt package
It works.
I also try to build with MAVEN, but it seems not working. Since I already have SCALA_HOME, I will directly run that
Syntax: ./run <class> <params>
>./run spark.examples.SparkLR local[2]
Or
>./run spark.examples.SparkPi local
I try to run spark to verify my environment, but it seems that it is not working because of the SCALA_HOME.
Error Message:
Exception in thread "main" java.lang.NoClassDefFoundError: scala/reflect/ClassManifest
at spark.examples.SparkPi.main(SparkPi.scala)
Caused by: java.lang.ClassNotFoundException: scala.reflect.ClassManifest
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
Solution:
>cd examples
>sudo mvn eclipse:eclipse
>cd ..
>sudo mvn eclipse:eclipse
try to import the samples and spark project into my eclipse and read the resource codes.
Read the code in spark-examples/src/main/scala/spark/examples/SparkPi.scala
Run this again
>sudo ./run spark.examples.SparkPi local
Still not working, told me SCALA_HOME is not set. But I am sure it is there.
>wget http://www.spark-project.org/files/spark-0.7.0-sources.tgz
Unzip and put it in the working directory
>sudo ln -s /Users/carl/tool/spark-0.7.0 /opt/spark-0.7.0
>sudo ln -s /opt/spark-0.7.0 /opt/spark
Compile the source codes
>sudo sbt/sbt compile
>sudo sbt/sbt package
>sudo sbt/sbt assembly
>sudo ./run spark.examples.SparkPi local
Error is still there, SCALA_HOME is not set.
Finally, I found the reason. I should change the conf/spark-env.sh
>cd conf
>cp spark-env.sh.template spark-env.sh
And be careful, do not use Scala version 2.10.0 there. I should use 2.9.2
export SCALA_HOME=/opt/scala2.9.2
This time, every thing will go well.
>sudo ./run spark.examples.SparkPi local
>sudo ./run spark.examples.SparkLR local[2]
Use local 2 CPU.
References:
Spark
http://www.ibm.com/developerworks/cn/opensource/os-spark/
http://spark-project.org/documentation/
http://rdc.taobao.com/team/jm/archives/tag/spark
http://rdc.taobao.com/team/jm/archives/2043
http://spark-project.org/examples/
http://rdc.taobao.com/team/jm/archives/1871
http://ampcamp.berkeley.edu/amp-camp-one-berkeley-2012/
http://run-xiao.iteye.com/blog/1835707
http://www.yiihsia.com/2011/12/%E5%88%9D%E5%A7%8Bspark-%E5%9F%BA%E6%9C%AC%E6%A6%82%E5%BF%B5%E5%92%8C%E4%BE%8B%E5%AD%90/
http://www.cnblogs.com/jerrylead/archive/2012/08/13/2636115.html
http://blog.csdn.net/macyang/article/details/7100523
Git resource
https://github.com/mesos/spark
HDFS
http://www.cnblogs.com/forfuture1978/archive/2010/03/14/1685351.html
Hadoop
http://blog.csdn.net/robertleepeak/article/details/6001369
mesos
http://dongxicheng.org/mapreduce-nextgen/mesos_vs_yarn/
zookeeper
http://rdc.taobao.com/team/jm/archives/665
1. Introduction
1.1 MapReduce Model
Map -- read, convert
Reduce -- calculate
4 classes
Read and Convert data to key-value, Map, Reduce, Convert and output key-value to output data.
1.2 Apache mesos
Mesos and YARN, they can control the resource. Resource sharing system.
Hadoop scheduler, MPI scheduler, Spark
Mesos master, Standby master, … (Controlled by ZooKeeper)
Mesos slave, Mesos slave, Mesos slave … (execute Hadoop executor task, MPI executor task, … )
Mesos-master: manage framework and slave, give the resource from slave to framework
Mesos-slave: mesos-task
Framework: Hadoop, Spark …
Executor:
1.3 Spark Introduction
Spark is implemented by Scala and based on Mesos.
It can work with Hadoop and EC2, directly read data from HDFS or S3.
Bagel Shark
Spark(RDD, Map Reduce, FP)
Mesos
HDFS AWS s3n
Spark is using Map Reduce Model, function programming, Mesos, HDFS and S3
Spark Terms
RDD - Resilient Distributed Datasets
Local mode and Mesos Mode
Tansformations and Actions -
Transformation will return RDD,
Action return a collection of scala, value, null
Spark on Mesos
RDD + Job(tasks) ----> SparkScheduler -----> Mesos Master ---> Mesos Slave, Mesos Slave … ( Spark executor… tasks)
1.4 HDFS Introduction
Hadoop Distributed File System ---- NameNode(Only One)------> DataNode
Block 64M, default block of file
NameNode File name, tree, namespace image, edit log, How many blocks does one file have, where is them on the DataNodes.
DataNode Client or NameNode can write and read data from DataNodes
1.5 Zookeeper
Configuration Management
Cluster Management
1.6 NFS Introduction
NFS - Network FileSystem
2. Installation of Spark
After the version 0.6, we can ignore Mesos at first.
Get the source codes
>git clone https://github.com/mesos/spark.git
My scala version is 2.10.0, just try the command
>sudo sbt/sbt package
It works.
I also try to build with MAVEN, but it seems not working. Since I already have SCALA_HOME, I will directly run that
Syntax: ./run <class> <params>
>./run spark.examples.SparkLR local[2]
Or
>./run spark.examples.SparkPi local
I try to run spark to verify my environment, but it seems that it is not working because of the SCALA_HOME.
Error Message:
Exception in thread "main" java.lang.NoClassDefFoundError: scala/reflect/ClassManifest
at spark.examples.SparkPi.main(SparkPi.scala)
Caused by: java.lang.ClassNotFoundException: scala.reflect.ClassManifest
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
Solution:
>cd examples
>sudo mvn eclipse:eclipse
>cd ..
>sudo mvn eclipse:eclipse
try to import the samples and spark project into my eclipse and read the resource codes.
Read the code in spark-examples/src/main/scala/spark/examples/SparkPi.scala
Run this again
>sudo ./run spark.examples.SparkPi local
Still not working, told me SCALA_HOME is not set. But I am sure it is there.
>wget http://www.spark-project.org/files/spark-0.7.0-sources.tgz
Unzip and put it in the working directory
>sudo ln -s /Users/carl/tool/spark-0.7.0 /opt/spark-0.7.0
>sudo ln -s /opt/spark-0.7.0 /opt/spark
Compile the source codes
>sudo sbt/sbt compile
>sudo sbt/sbt package
>sudo sbt/sbt assembly
>sudo ./run spark.examples.SparkPi local
Error is still there, SCALA_HOME is not set.
Finally, I found the reason. I should change the conf/spark-env.sh
>cd conf
>cp spark-env.sh.template spark-env.sh
And be careful, do not use Scala version 2.10.0 there. I should use 2.9.2
export SCALA_HOME=/opt/scala2.9.2
This time, every thing will go well.
>sudo ./run spark.examples.SparkPi local
>sudo ./run spark.examples.SparkLR local[2]
Use local 2 CPU.
References:
Spark
http://www.ibm.com/developerworks/cn/opensource/os-spark/
http://spark-project.org/documentation/
http://rdc.taobao.com/team/jm/archives/tag/spark
http://rdc.taobao.com/team/jm/archives/2043
http://spark-project.org/examples/
http://rdc.taobao.com/team/jm/archives/1871
http://ampcamp.berkeley.edu/amp-camp-one-berkeley-2012/
http://run-xiao.iteye.com/blog/1835707
http://www.yiihsia.com/2011/12/%E5%88%9D%E5%A7%8Bspark-%E5%9F%BA%E6%9C%AC%E6%A6%82%E5%BF%B5%E5%92%8C%E4%BE%8B%E5%AD%90/
http://www.cnblogs.com/jerrylead/archive/2012/08/13/2636115.html
http://blog.csdn.net/macyang/article/details/7100523
Git resource
https://github.com/mesos/spark
HDFS
http://www.cnblogs.com/forfuture1978/archive/2010/03/14/1685351.html
Hadoop
http://blog.csdn.net/robertleepeak/article/details/6001369
mesos
http://dongxicheng.org/mapreduce-nextgen/mesos_vs_yarn/
zookeeper
http://rdc.taobao.com/team/jm/archives/665
发表评论
-
Update Site will come soon
2021-06-02 04:10 1672I am still keep notes my tech n ... -
Hadoop Docker 2019 Version 3.2.1
2019-12-10 07:39 289Hadoop Docker 2019 Version 3.2. ... -
Nginx and Proxy 2019(1)Nginx Enable Lua and Parse JSON
2019-12-03 04:17 441Nginx and Proxy 2019(1)Nginx En ... -
Data Solution 2019(13)Docker Zeppelin Notebook and Memory Configuration
2019-11-09 07:15 284Data Solution 2019(13)Docker Ze ... -
Data Solution 2019(10)Spark Cluster Solution with Zeppelin
2019-10-29 08:37 245Data Solution 2019(10)Spark Clu ... -
AMAZON Kinesis Firehose 2019(1)Firehose Buffer to S3
2019-10-01 10:15 315AMAZON Kinesis Firehose 2019(1) ... -
Rancher and k8s 2019(3)Clean Installation on CentOS7
2019-09-19 23:25 307Rancher and k8s 2019(3)Clean In ... -
Pacemaker 2019(1)Introduction and Installation on CentOS7
2019-09-11 05:48 335Pacemaker 2019(1)Introduction a ... -
Crontab-UI installation and Introduction
2019-08-30 05:54 444Crontab-UI installation and Int ... -
Spiderkeeper 2019(1)Installation and Introduction
2019-08-29 06:49 495Spiderkeeper 2019(1)Installatio ... -
Supervisor 2019(2)Ubuntu and Multiple Services
2019-08-19 10:53 366Supervisor 2019(2)Ubuntu and Mu ... -
Supervisor 2019(1)CentOS 7
2019-08-19 09:33 324Supervisor 2019(1)CentOS 7 Ins ... -
Redis Cluster 2019(3)Redis Cluster on CentOS
2019-08-17 04:07 367Redis Cluster 2019(3)Redis Clus ... -
Amazon Lambda and Version Limit
2019-08-02 01:42 433Amazon Lambda and Version Limit ... -
MySQL HA Solution 2019(1)Master Slave on MySQL 5.7
2019-07-27 22:26 514MySQL HA Solution 2019(1)Master ... -
RabbitMQ Cluster 2019(2)Cluster HA and Proxy
2019-07-11 12:41 456RabbitMQ Cluster 2019(2)Cluster ... -
Running Zeppelin with Nginx Authentication
2019-05-25 21:35 318Running Zeppelin with Nginx Aut ... -
Running Zeppelin with Nginx Authentication
2019-05-25 21:34 316Running Zeppelin with Nginx Aut ... -
ElasticSearch(3)Version Upgrade and Cluster
2019-05-20 05:00 322ElasticSearch(3)Version Upgrade ... -
Jetty Server and Cookie Domain Name
2019-04-28 23:59 396Jetty Server and Cookie Domain ...
相关推荐
关于spark sql的英文讲义,通过讲义的学习,可以对spark sql有一定的了解
蘑菇街大数据技术 Spark Shuffle Introduction 共33页.pptx
This book starts with the fundamentals of Spark 2 and covers the core data processing framework and API, installation, and application development setup. Then the Spark programming model is introduced...
Apache Spark 2.4 comes packed with a lot of new functionalities and improvements, including the new barrier execution mode, flexible streaming sink, the native AVRO data source, PySpark’s eager ...
Frank Kane's Taming Big Data with Apache Spark and Python English | 2017 | ISBN-10: 1787287947 | 296 pages | AZW3/PDF/EPUB (conv) | 6.12 Mb Key Features Understand how Spark can be distributed across...
High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark by Holden Karau English | 25 May 2017 | ASIN: B0725YT69J | 358 Pages | AZW3 | 3.09 MB Apache Spark is amazing when ...
Chapter 1, Introduction to Scala, will teach big data analytics using the Scalabased APIs of Spark. Spark itself is written with Scala and naturally, as a starting point, we will discuss a brief ...
This book starts with the fundamentals of Spark 2.0 and covers the core data processing framework and API, installation, and application development setup. Then the Spark programming model is ...
很好的关于Spark的 介绍, 其中也包含了databricks 公司的 官方推荐的eBook, A-Gentle-Introduction-to-Apache-Spark。 备注:里面共有:9 pdf file。都是English Version。 推荐都看一下。 其实很容易理解!
Scala and Spark for Big Data Analytics by Md. Rezaul Karim English | 25 July 2017 | ISBN: 1785280848 | ASIN: B072J4L8FQ | 898 Pages | AZW3 | 20.56 MB Harness the power of Scala to program Spark and ...
关于Spark Streaming的介绍,讲课用的讲义,英文版本的
关于spark graphx的介绍,上课用讲义,英文版本,通过讲义,可以了解spark graphx
这本书《Taming Big Data with Apache Spark and Python》由Frank Kane所著,主要讲解了如何使用Apache Spark和Python来分析大规模数据集,并提供了真实的案例帮助读者理解和实践。Apache Spark是一个开源的分布式...
Big Data with Apache Spark and Python 英文无水印pdf pdf所有页面使用FoxitReader和PDF-XChangeViewer测试都可以打开 本资源转载自网络,如有侵权,请联系上传者或csdn删除 本资源转载自网络,如有侵权,请...
The book describes the emergence of big data technologies and the role of Spark in the entire big data stack. It compares Spark and Hadoop and identifies the shortcomings of Hadoop that have been ...
Big Data, MapReduce, Hadoop, and Spark with Python: Master Big Data Analytics and Data Wrangling with MapReduce Fundamentals using Hadoop, Spark, and Python by LazyProgrammer English | 15 Aug 2016 | ...
1. **资源管理与调度**:Spark 使用弹性分布式数据集(RDD)作为其核心抽象,通过YARN或Mesos等集群管理系统分配和调度资源。理解如何合理配置executor的数量、内存大小和CPU核心数对于性能至关重要。 2. **数据...