`
bit1129
  • 浏览: 1067924 次
  • 性别: Icon_minigender_1
  • 来自: 北京
社区版块
存档分类
最新评论

【Spark二十二】在Intellij Idea中调试运行Spark应用程序

 
阅读更多

Scala版本

scala-2.10.4

 

说明:

之前搭建环境一直不成功,原因可能是使用了Scala-2.11.4版本导致的。Spark的官方网站明确的说Spark-1.2.0不支持Scala2.11.4版本:

 

Note: Scala 2.11 users should download the Spark source package and build with Scala 2.11 support.

 

Spark版本:

spark-1.2.0-bin-hadoop2.4.tgz

 

配置环境变量

 

export SCALA_HOME=/home/hadoop/spark1.2.0/scala-2.10.4
export PATH=$SCALA_HOME/bin:$PATH
export SPARK_HOME=/home/hadoop/spark1.2.0/spark-1.2.0-bin-hadoop2.4
export PATH=$SPARK_HOME/bin:$PATH

 

搭建Intellij Idea开发Spark程序的环境

1. 下载安装Scala插件

2. 创建 Scala的Non-SBT项目

3. 导入Spark的jar包

spark-1.2.0-bin-hadoop2.4

4.编写wordcount例子代码

 

package spark.examples

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

import org.apache.spark.SparkContext._

object SparkWordCount {
  def main(args: Array[String]) {
    ///注意setMaster("local")这行代码,表明Spark以local运行(注意local与standalone模式的区别)
    val conf = new SparkConf().setAppName("SparkWordCount").setMaster("local")
    val sc = new SparkContext(conf)
    val rdd = sc.textFile("file:///home/hadoop/spark1.2.0/word.txt")
    rdd.flatMap(_.split(" ")).map((_, 1)).reduceByKey(_ + _).map(x => (x._2, x._1)).sortByKey(false).map(x => (x._2, x._1)).saveAsTextFile("file:///home/hadoop/spark1.2.0/WordCountResult")
    sc.stop
  }
}

 

控制台日志:

 

15/01/14 22:06:34 WARN Utils: Your hostname, hadoop-Inspiron-3521 resolves to a loopback address: 127.0.1.1; using 192.168.0.111 instead (on interface eth1)
15/01/14 22:06:34 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/01/14 22:06:35 INFO SecurityManager: Changing view acls to: hadoop
15/01/14 22:06:35 INFO SecurityManager: Changing modify acls to: hadoop
15/01/14 22:06:35 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/01/14 22:06:36 INFO Slf4jLogger: Slf4jLogger started
15/01/14 22:06:36 INFO Remoting: Starting remoting
15/01/14 22:06:36 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@hadoop-Inspiron-3521.local:53624]
15/01/14 22:06:36 INFO Utils: Successfully started service 'sparkDriver' on port 53624.
15/01/14 22:06:36 INFO SparkEnv: Registering MapOutputTracker
15/01/14 22:06:36 INFO SparkEnv: Registering BlockManagerMaster
15/01/14 22:06:36 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20150114220636-4826
15/01/14 22:06:36 INFO MemoryStore: MemoryStore started with capacity 461.7 MB
15/01/14 22:06:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/01/14 22:06:37 INFO HttpFileServer: HTTP File server directory is /tmp/spark-19683393-0315-498c-9b72-9c6a13684f44
15/01/14 22:06:37 INFO HttpServer: Starting HTTP Server
15/01/14 22:06:38 INFO Utils: Successfully started service 'HTTP file server' on port 53231.
15/01/14 22:06:43 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/01/14 22:06:43 INFO SparkUI: Started SparkUI at http://hadoop-Inspiron-3521.local:4040
15/01/14 22:06:43 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@hadoop-Inspiron-3521.local:53624/user/HeartbeatReceiver
15/01/14 22:06:44 INFO NettyBlockTransferService: Server created on 46971
15/01/14 22:06:44 INFO BlockManagerMaster: Trying to register BlockManager
15/01/14 22:06:44 INFO BlockManagerMasterActor: Registering block manager localhost:46971 with 461.7 MB RAM, BlockManagerId(<driver>, localhost, 46971)
15/01/14 22:06:44 INFO BlockManagerMaster: Registered BlockManager
15/01/14 22:06:44 INFO MemoryStore: ensureFreeSpace(163705) called with curMem=0, maxMem=484127539
15/01/14 22:06:44 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 159.9 KB, free 461.5 MB)
15/01/14 22:06:45 INFO MemoryStore: ensureFreeSpace(22692) called with curMem=163705, maxMem=484127539
15/01/14 22:06:45 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 22.2 KB, free 461.5 MB)
15/01/14 22:06:45 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:46971 (size: 22.2 KB, free: 461.7 MB)
15/01/14 22:06:45 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0
15/01/14 22:06:45 INFO SparkContext: Created broadcast 0 from textFile at SparkWordCount.scala:40
15/01/14 22:06:45 INFO FileInputFormat: Total input paths to process : 1
15/01/14 22:06:45 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
15/01/14 22:06:45 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
15/01/14 22:06:45 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
15/01/14 22:06:45 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
15/01/14 22:06:45 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
15/01/14 22:06:46 INFO SparkContext: Starting job: saveAsTextFile at SparkWordCount.scala:43
15/01/14 22:06:46 INFO DAGScheduler: Registering RDD 3 (map at SparkWordCount.scala:43)
15/01/14 22:06:46 INFO DAGScheduler: Registering RDD 5 (map at SparkWordCount.scala:43)
15/01/14 22:06:46 INFO DAGScheduler: Got job 0 (saveAsTextFile at SparkWordCount.scala:43) with 1 output partitions (allowLocal=false)
15/01/14 22:06:46 INFO DAGScheduler: Final stage: Stage 2(saveAsTextFile at SparkWordCount.scala:43)
15/01/14 22:06:46 INFO DAGScheduler: Parents of final stage: List(Stage 1)
15/01/14 22:06:46 INFO DAGScheduler: Missing parents: List(Stage 1)
15/01/14 22:06:46 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[3] at map at SparkWordCount.scala:43), which has no missing parents
15/01/14 22:06:46 INFO MemoryStore: ensureFreeSpace(3560) called with curMem=186397, maxMem=484127539
15/01/14 22:06:46 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.5 KB, free 461.5 MB)
15/01/14 22:06:46 INFO MemoryStore: ensureFreeSpace(2528) called with curMem=189957, maxMem=484127539
15/01/14 22:06:46 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.5 KB, free 461.5 MB)
15/01/14 22:06:46 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:46971 (size: 2.5 KB, free: 461.7 MB)
15/01/14 22:06:46 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0
15/01/14 22:06:46 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:838
15/01/14 22:06:46 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 (MappedRDD[3] at map at SparkWordCount.scala:43)
15/01/14 22:06:46 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
15/01/14 22:06:46 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1292 bytes)
15/01/14 22:06:46 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
15/01/14 22:06:46 INFO HadoopRDD: Input split: file:/home/hadoop/spark1.2.0/word.txt:0+29
15/01/14 22:06:46 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1895 bytes result sent to driver
15/01/14 22:06:46 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 323 ms on localhost (1/1)
15/01/14 22:06:46 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
15/01/14 22:06:46 INFO DAGScheduler: Stage 0 (map at SparkWordCount.scala:43) finished in 0.350 s
15/01/14 22:06:46 INFO DAGScheduler: looking for newly runnable stages
15/01/14 22:06:46 INFO DAGScheduler: running: Set()
15/01/14 22:06:46 INFO DAGScheduler: waiting: Set(Stage 1, Stage 2)
15/01/14 22:06:46 INFO DAGScheduler: failed: Set()
15/01/14 22:06:46 INFO DAGScheduler: Missing parents for Stage 1: List()
15/01/14 22:06:46 INFO DAGScheduler: Missing parents for Stage 2: List(Stage 1)
15/01/14 22:06:46 INFO DAGScheduler: Submitting Stage 1 (MappedRDD[5] at map at SparkWordCount.scala:43), which is now runnable
15/01/14 22:06:46 INFO MemoryStore: ensureFreeSpace(2992) called with curMem=192485, maxMem=484127539
15/01/14 22:06:46 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.9 KB, free 461.5 MB)
15/01/14 22:06:46 INFO MemoryStore: ensureFreeSpace(2158) called with curMem=195477, maxMem=484127539
15/01/14 22:06:46 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.1 KB, free 461.5 MB)
15/01/14 22:06:46 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:46971 (size: 2.1 KB, free: 461.7 MB)
15/01/14 22:06:46 INFO BlockManagerMaster: Updated info of block broadcast_2_piece0
15/01/14 22:06:46 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:838
15/01/14 22:06:46 INFO DAGScheduler: Submitting 1 missing tasks from Stage 1 (MappedRDD[5] at map at SparkWordCount.scala:43)
15/01/14 22:06:46 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
15/01/14 22:06:46 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, PROCESS_LOCAL, 1045 bytes)
15/01/14 22:06:46 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
15/01/14 22:06:46 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
15/01/14 22:06:46 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 12 ms
15/01/14 22:06:46 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 1000 bytes result sent to driver
15/01/14 22:06:46 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 110 ms on localhost (1/1)
15/01/14 22:06:46 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
15/01/14 22:06:46 INFO DAGScheduler: Stage 1 (map at SparkWordCount.scala:43) finished in 0.106 s
15/01/14 22:06:46 INFO DAGScheduler: looking for newly runnable stages
15/01/14 22:06:46 INFO DAGScheduler: running: Set()
15/01/14 22:06:46 INFO DAGScheduler: waiting: Set(Stage 2)
15/01/14 22:06:46 INFO DAGScheduler: failed: Set()
15/01/14 22:06:46 INFO DAGScheduler: Missing parents for Stage 2: List()
15/01/14 22:06:46 INFO DAGScheduler: Submitting Stage 2 (MappedRDD[8] at saveAsTextFile at SparkWordCount.scala:43), which is now runnable
15/01/14 22:06:47 INFO MemoryStore: ensureFreeSpace(112880) called with curMem=197635, maxMem=484127539
15/01/14 22:06:47 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 110.2 KB, free 461.4 MB)
15/01/14 22:06:47 INFO MemoryStore: ensureFreeSpace(67500) called with curMem=310515, maxMem=484127539
15/01/14 22:06:47 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 65.9 KB, free 461.3 MB)
15/01/14 22:06:47 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on localhost:46971 (size: 65.9 KB, free: 461.6 MB)
15/01/14 22:06:47 INFO BlockManagerMaster: Updated info of block broadcast_3_piece0
15/01/14 22:06:47 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:838
15/01/14 22:06:47 INFO DAGScheduler: Submitting 1 missing tasks from Stage 2 (MappedRDD[8] at saveAsTextFile at SparkWordCount.scala:43)
15/01/14 22:06:47 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
15/01/14 22:06:47 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2, localhost, PROCESS_LOCAL, 1056 bytes)
15/01/14 22:06:47 INFO Executor: Running task 0.0 in stage 2.0 (TID 2)
15/01/14 22:06:47 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
15/01/14 22:06:47 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/01/14 22:06:47 INFO FileOutputCommitter: Saved output of task 'attempt_201501142206_0002_m_000000_2' to file:/home/hadoop/spark1.2.0/WordCountResult/_temporary/0/task_201501142206_0002_m_000000
15/01/14 22:06:47 INFO SparkHadoopWriter: attempt_201501142206_0002_m_000000_2: Committed
15/01/14 22:06:47 INFO Executor: Finished task 0.0 in stage 2.0 (TID 2). 824 bytes result sent to driver
15/01/14 22:06:47 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 397 ms on localhost (1/1)
15/01/14 22:06:47 INFO DAGScheduler: Stage 2 (saveAsTextFile at SparkWordCount.scala:43) finished in 0.399 s
15/01/14 22:06:47 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 
15/01/14 22:06:47 INFO DAGScheduler: Job 0 finished: saveAsTextFile at SparkWordCount.scala:43, took 1.241181 s
15/01/14 22:06:47 INFO SparkUI: Stopped Spark web UI at http://hadoop-Inspiron-3521.local:4040
15/01/14 22:06:47 INFO DAGScheduler: Stopping DAGScheduler
15/01/14 22:06:48 INFO MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!
15/01/14 22:06:48 INFO MemoryStore: MemoryStore cleared
15/01/14 22:06:48 INFO BlockManager: BlockManager stopped
15/01/14 22:06:48 INFO BlockManagerMaster: BlockManagerMaster stopped
15/01/14 22:06:48 INFO SparkContext: Successfully stopped SparkContext
15/01/14 22:06:48 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
15/01/14 22:06:48 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.

Process finished with exit code 0

调整日志级别

 

从上面的输出中,可以看到,Spark默认是输出INFO级别的日志,为了查看全部的日志,可以设置Spark的日志输出,办法是在wordcount项目的源代码根目录创建一个log4j.properties文件,其中的内容

 

log4j.rootCategory=DEBUG, file
log4j.appender.file=org.apache.log4j.ConsoleAppender
#如果要把日志输出到某个文件中,则使用FileAppender
#log4j.appender.file=org.apache.log4j.FileAppender
#log4j.appender.file.file=spark.log

log4j.appender.file.append=false
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss.SSS} %t %p %c{1}: %m%n

# Ignore messages below warning level from Jetty, because it's a bit verbose
log4j.logger.org.eclipse.jetty=WARN
org.eclipse.jetty.LEVEL=WARN

 

 

注意问题:

在Windows上搭建Spark的开发环境报错,
15/01/17 16:17:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/01/17 16:17:04 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
	at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
	at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
	at org.apache.hadoop.util.Shell.<clinit>(Shell.java:326)
	at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:76)
	at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)
	at org.apache.hadoop.security.Groups.<init>(Groups.java:77)
	at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240)
	at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255)
	at org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:283)
	at org.apache.spark.deploy.SparkHadoopUtil.<init>(SparkHadoopUtil.scala:43)
	at org.apache.spark.deploy.SparkHadoopUtil$.<init>(SparkHadoopUtil.scala:202)
	at org.apache.spark.deploy.SparkHadoopUtil$.<clinit>(SparkHadoopUtil.scala)
	at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:1784)
	at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:105)
	at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:180)
	at org.apache.spark.SparkEnv$.create(SparkEnv.scala:292)
	at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:159)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:232)
	at spark.examples.SparkWordCount$.main(SparkWordCount.scala:39)
	at spark.examples.SparkWordCount.main(SparkWordCount.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
 
 
解决办法:
1. 解压一个Hadoop,在环境变量里HADOOP_HOME
HADOOP_HOME=E:/devsoftware/hadoop-2.5.2/hadoop-2.5.2
如果环境变量不好使,则在代码中添加如下语句
System.setProperty("hadoop.home.dir", "E:/devsoftware/hadoop-2.5.2/hadoop-2.5.2");
 2. 将winutils.exe拷贝至bin目录下
E:/devsoftware/hadoop-2.5.2/hadoop-2.5.2/bin
下载地址:
http://download.csdn.net/download/zxyacb2012/8193465
 
不需要 hadoop.dll文件
 

 

下一步

经过几天的折腾,终于打破僵局,可以调试Spark的源代码了,接下来加快步伐,提升Spark的修为!

 

 

 

 

 

 

 

 

分享到:
评论

相关推荐

    用sbt构造好的Intellij版的spark工程

    5. **在IntelliJ IDEA中开发Spark应用** - 创建Scala类,继承`org.apache.spark.SparkContext`或`org.apache.spark.sql.SparkSession`,编写Spark程序。 - 使用IntelliJ IDEA的代码提示、调试和重构功能提升开发...

    win10下的Spark本地idea环境搭建教程.docx

    【Spark本地开发环境搭建】 ...如果需要运行Spark程序,可以通过配置好的Spark运行配置在本地启动Spark Shell或提交任务,以进行测试和调试。在本地环境下,这通常比在集群上运行更快,更方便进行迭代开发。

    Spark Local + Maven + IDEA项目

    2. 运行Spark程序:在IDEA中,选择“Run”菜单,点击“Edit Configurations”,创建一个新的"Application"配置。在Main class中选择你刚创建的Spark程序,设置相关参数,如Master为`local[*]`,表示在本地多线程模式...

    保险业务与分析系统 IntelliJ IDEA 2022.1.3 Visual Studio Code

    IntelliJ IDEA和Visual Studio Code是两款广受欢迎的集成开发环境(IDE),它们在保险业务与分析系统的开发过程中发挥着关键作用。 IntelliJ IDEA,作为JetBrains公司开发的一款专业级Java IDE,以其强大的代码智能...

    windows环境运行spark部署说明文档

    执行`spark-submit`命令提交你的应用,或者在IntelliJ IDEA中直接运行。 总结,Windows环境下部署Spark运行环境涉及多个步骤,包括安装Java、Scala、Spark以及Hadoop的配置。使用IntelliJ IDEA作为IDE,能有效提高...

    详解IntelliJ IDEA创建spark项目的两种方式

    在使用 IntelliJ IDEA 创建 Spark 项目时,需要了解两种不同的方式,这两种方式可以帮助开发者快速创建 Spark 项目。 方式一:通过选择File-&gt;new Project-&gt;Java-&gt;Scala创建Spark项目 通过选择File-&gt;new Project-&gt;...

    SparkTest:Intellij Idea中的Spark Scala项目样本

    **SparkTest:IntelliJ IDEA中的Spark ...此外,你还可以了解如何处理实时数据流,以及如何编写和运行Spark作业。总之,这个项目提供了一个实践平台,帮助开发者深入理解和掌握Spark在Scala和IntelliJ IDEA中的应用。

    Windows下idea运行spark程序相关的hadoop2.7.3插件工具 hadoop.dll winutils.exe

    为了在IntelliJ IDEA(简称IDEA)中顺利运行Spark程序,我们需要确保正确地配置了环境变量。首先,需要将解压后的Hadoop目录添加到系统的PATH环境变量中,以便Java可以找到`hadoop.dll`和`winutils.exe`。然后,需要...

    scala-intellij-bin-2017.2.13

    通过此插件,用户可以在IntelliJ IDEA中直接创建和运行Spark应用程序,进行数据处理和分析。插件提供了对Spark相关库的智能感知,使得编写Spark作业时可以享受到IDE的全面支持,包括代码提示、重构以及单元测试。 ...

    intellij的scala工具bin文件

    总结来说,"intellij的scala工具bin文件"是IntelliJ IDEA中用于支持Scala开发的关键组件,它使得开发者能够高效地编写Spark应用程序,享受Scala语言的强大特性和Spark框架的高性能计算能力。通过这些工具,你可以...

    基于HDFS的spark分布式Scala程序测试

    此外,也可以利用Spark UI监控应用程序的状态和性能。 #### 五、总结 本文详细介绍了如何在Hadoop分布式集群和基于HDFS的Spark集群上部署Scala程序WordCount测试的过程。通过这种方式,不仅能够加深对Hadoop和Spark...

    云计算与大数据技术概论-实验2-2 Spark编程工具:使用IDEA.docx

    9. **验证IDEA配置**:在IDEA中创建一个Scala项目,检查是否能正常导入Spark库,以及是否可以编写和运行Spark程序。 完成以上步骤后,你将拥有一个完整的Spark开发环境,可以在IDEA中进行Spark应用的编写、测试和...

    2020.03版本idea ,big data tools 插件

    3. **运行与调试**:允许用户直接在IDE内提交Spark或Hadoop作业,进行本地或远程运行,以及调试。这极大地简化了开发流程,减少了在命令行和IDE之间切换的麻烦。 4. **数据浏览与可视化**:通过集成Hue、HBase等...

    scala插件 scala-intellij-bin-2018.3.5.zip scala-intellij-bin-2018.3.6.zip

    通过这个插件,开发者可以在IDE中直接编写Spark程序,享受代码提示和错误检测等功能,同时可以直接在IDE中运行和调试Spark作业,无需频繁地与命令行交互。 在使用Scala插件进行Spark开发时,要注意以下几点: 1. *...

    scala-intellij-bin-2020.3.18.zip

    总之,`scala-intellij-bin-2020.3.18.zip`为IntelliJ IDEA带来了全面的Scala开发支持,让开发者能够在一个高效的环境中进行大数据Spark项目或者其他Scala应用的开发。通过集成的工具和特性,它可以提高开发者的生产...

    spark-3.2.4-bin-hadoop3.2-scala2.13 安装包

    综上所述,“spark-3.2.4-bin-hadoop3.2-scala2.13”安装包是构建和运行Spark应用程序的基础,涵盖了大数据处理、流处理、机器学习等多个领域,为开发者提供了高效、灵活的数据处理平台。通过深入理解和熟练运用,...

    Spark大数据处理:技术、应用与性能优化(全)(更多IT教程 微信dtygxmb).docx

    开发Spark应用程序,可以使用IntelliJ IDEA、Eclipse或SBT等工具,配合Spark Shell进行快速开发和测试。远程调试功能有助于优化代码和解决问题。此外,书中还提供了多种实际应用场景的编程示例,如WordCount、Top K...

Global site tag (gtag.js) - Google Analytics