In distributed mode, Spark uses a master/slave architecture with one central coordinator and many distributed workers. The central coordinator is called the driver.The driver communicates with a potentially large number of distributed workers called executors.The driver runs in its own Java process and each executor is a separate Java process. A driver and its executors are together termed a Spark application.
The Driver
- Converting a user program into tasks
- Scheduling tasks on executors
Executors
- run the tasks that make up the application and return results to the driver
- provide in-memory storage for RDDs that are cached by user programs
Cluster Manager
Spark depends on a cluster manager to launch executors and,in certain cases, to launch the driver.The cluster manager is a pluggable component in Spark.This allows Spark to run on top of different external managers,such as YARN and Mesos,as well as its built-in Standalone cluster manager.
Spark can run both drivers and executors on the YARN worker nodes.
The procdure of run a spark application
- The user submits an application using spark-submit.
- spark-submit launches the driver program and invokes the main() method specified by the user.
- The driver program contacts the cluster manager to ask for resources to launch executors.
- The cluster manager launches executors on behalf of the driver program.
- The driver process runs through the user application.Based on the RDD actions and transformations in the program,the driver sends work to executors in the form of tasks.
- Tasks are run on executor processes to compute and save results.
- If the driver’s main() method exits or it calls SparkContext.stop(),it will terminate the executors and release resources from the cluster manager.
Preferences
<<Learning Spark>>
相关推荐
### Spark: 集群计算工作集模型与迭代学习算法的高效执行 #### 概述 在大数据处理领域,MapReduce及其变体在实施大规模数据密集型应用方面取得了巨大成功,尤其是在利用廉价集群进行分布式计算时。然而,这类系统...
mongodb-spark官方连接器,运行spark-submit --packages org.mongodb.spark:mongo-spark-connector_2.11:1.1.0可以自动下载,国内网络不容易下载成功,解压后保存到~/.ivy2目录下即可。
《深入理解Spark:核心思想及源码分析》这本书旨在帮助读者深入掌握Apache Spark这一大数据处理框架的核心原理与实现细节。Spark作为一个快速、通用且可扩展的数据处理系统,已经在大数据领域得到了广泛应用。它提供...
Spark: The Definitive Guide: Big Data Processing Made Simple 1st Edition Spark: The Definitive Guide: Big Data Processing Made Simple 1st Edition Spark: The Definitive Guide: Big Data Processing Made ...
《Spark: The Definitive Guide: Big Data Processing Made Simple》是大数据处理领域的经典著作,由Databricks的创始人之一Michael Armbrust等专家撰写。这本书深入浅出地介绍了Apache Spark的核心概念、架构以及...
High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark by Holden Karau English | 25 May 2017 | ASIN: B0725YT69J | 358 Pages | AZW3 | 3.09 MB Apache Spark is amazing when ...
Spark: svn co http://svn.igniterealtime.org/svn/repos/spark/trunk spark 辛辛苦苦从SVN上下载下来的SOURCE Spark 2.6.3 Spark: spark/trunk part001 第一部分
Spark: svn co http://svn.igniterealtime.org/svn/repos/spark/trunk spark 辛辛苦苦从SVN上下载下来的SOURCE Spark 2.6.3 Spark: spark/trunk part003 第三部分
Spark: Big Data Cluster Computing in Production English | 2016 | ISBN: 1119254019 | 216 pages | PDF | 5 MB Production-targeted Spark guidance with real-world use cases Spark: Big Data Cluster ...
Apache Spark:Spark项目实战:实时推荐系统.docx
Apache Spark:Spark高级特性:DataFrame与Dataset.docx
Apache Spark:Spark项目实战:大数据分析案例.docx
Apache Spark:Spark项目实战:机器学习模型部署.docx
深入理解Sp深入理解SPARK:核心思想与源码分析》结合大量图和示例,对Spark的架构、部署模式和工作模块的设计理念、实现源码与使用技巧进行了深入的剖析与解读。 《深入理解SPARK:核心思想与源码分析》一书对Spark...
Spark: svn co http://svn.igniterealtime.org/svn/repos/spark/trunk spark 辛辛苦苦从SVN上下载下来的SOURCE Spark 2.6.3 Spark: spark/trunk part002 第二部分
酷玩 Spark: Spark 源代码解析、Spark 类库等。、。。。
标题“Real-Time Big Data Analytics: Emerging Architecture”(实时大数据分析:新兴架构)涉及的核心知识点主要集中在大数据处理速度的提升,尤其是从批处理模式到实时处理模式的演变。在描述中提到,仅仅几年前...
Apache Spark:Spark性能调优.docx
Spark:零基础实战
Apache Spark:Spark核心架构解析.docx