`
Kevin12
  • 浏览: 234834 次
  • 性别: Icon_minigender_1
  • 来自: 上海
社区版块
存档分类
最新评论

eclipse开发spark程序配置本地运行

阅读更多
今天简单讲一下在local模式下用eclipse开发一个简单的spark应用程序,并在本地运行测试。
1.下载最新版的scala for eclipse版本,选择windows 64位,下载网址:http://scala-ide.org/download/sdk.html


下载好后解压到D盘,打开并选择工作空间。


然后创建一个测试项目ScalaDev,右击项目选择Properties,在对话框中选择Scala Compiler,在右面页签中勾选Use Project Settings和Scala Installation点击ok,保存配置。



2.添加spark1.6.0的jar文件依赖spark-assembly-1.6.0-hadoop2.6.0.jar,并添加到项目中。
spark-assembly-1.6.0-hadoop2.6.0.jar在spark-1.6.0-bin-hadoop2.6.tgz包中的lib下面。



右击ScalaDev项目选择Build Path->Configure Build Path


注:如果你选择了Scala Installation为Latest2.11 bundle(dynamic)项目会报如下的错误:ScalaDev工程上出现一个红叉,查看Problems下面的原因是scala编译版本和spark的不一致导致。
More than one scala library found in the build path (D:/eclipse/plugins/org.scala-lang.scala-library_2.11.7.v20150622-112736-1fbce4612c.jar, F:/IMF/Big_Data_Software/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar).At least one has an incompatible version. Please update the project build path so it contains only one compatible scala library.




解决方法:右击Scala Library Container->Properties,在弹出框中选择Latest 2.10 bundle(dynamic),保存即可。



3.在src下创建spark工程包,并创建入口类。
选择项目New -> Package创建com.imf.spark包;



选择com.imf.spark包名,创建Scala Object;



测试程序前,要将spark-1.6.0-bin-hadoop2.6目录中的README.md文件拷贝到D://testspark//目录中,代码如下:
package com.imf.spark

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
/**
 * 用户scala开发本地测试的spark wordcount程序
 */
object WordCount {
   def main(args: Array[String]): Unit = {
     /**
      * 1.创建Spark的配置对象SparkConf,设置Spark程序的运行时的配置信息,
      * 例如:通过setMaster来设置程序要链接的Spark集群的Master的URL,如果设置为local,
      * 则代表Spark程序在本地运行,特别适合于机器配置条件非常差的情况。
      */
     //创建SparkConf对象
     val conf = new SparkConf()
     //设置应用程序名称,在程序运行的监控界面可以看到名称
     conf.setAppName("My First Spark App!")
     //设置local使程序在本地运行,不需要安装Spark集群
     conf.setMaster("local")
     /**
      * 2.创建SparkContext对象
      * SparkContext是spark程序所有功能的唯一入口,无论是采用Scala,java,python,R等都必须有一个SprakContext
      * SparkContext核心作用:初始化spark应用程序运行所需要的核心组件,包括DAGScheduler,TaskScheduler,SchedulerBackend
      * 同时还会负责Spark程序往Master注册程序等;
      * SparkContext是整个应用程序中最为至关重要的一个对象;
      */
     //通过创建SparkContext对象,通过传入SparkConf实例定制Spark运行的具体参数和配置信息
     val sc = new SparkContext(conf)

     /**
      * 3.根据具体数据的来源(HDFS,HBase,Local,FS,DB,S3等)通过SparkContext来创建RDD;
      * RDD的创建基本有三种方式:根据外部的数据来源(例如HDFS)、根据Scala集合、由其他的RDD操作;
      * 数据会被RDD划分成为一系列的Partitions,分配到每个Partition的数据属于一个Task的处理范畴;
      */
     //读取本地文件,并设置一个partition
     val lines = sc.textFile("D://testspark//README.md",1)

     /**
      * 4.对初始的RDD进行Transformation级别的处理,例如map,filter等高阶函数的变成,来进行具体的数据计算
      * 4.1.将每一行的字符串拆分成单个单词
      */
     //对每一行的字符串进行拆分并把所有行的拆分结果通过flat合并成一个大的集合
      val words = lines.flatMap { line => line.split(" ") }
     /**
      * 4.2.在单词拆分的基础上对每个单词实例计数为1,也就是word => (word,1)
      */
     val pairs = words.map{word =>(word,1)}

     /**
      * 4.3.在每个单词实例计数为1基础上统计每个单词在文件中出现的总次数
      */
     //对相同的key进行value的累积(包括Local和Reducer级别同时Reduce)
     val wordCounts = pairs.reduceByKey(_+_)
     //打印输出
     wordCounts.foreach(pair => println(pair._1+":"+pair._2))
     sc.stop()
   }
}


运行结果:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/01/26 08:23:37 INFO SparkContext: Running Spark version 1.6.0
16/01/26 08:23:42 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/01/26 08:23:42 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
    at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:355)
    at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:370)
    at org.apache.hadoop.util.Shell.<clinit>(Shell.java:363)
    at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79)
    at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:104)
    at org.apache.hadoop.security.Groups.<init>(Groups.java:86)
    at org.apache.hadoop.security.Groups.<init>(Groups.java:66)
    at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:280)
    at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:271)
    at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:248)
    at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:763)
    at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:748)
    at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:621)
    at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2136)
    at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2136)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2136)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:322)
    at com.dt.spark.WordCount$.main(WordCount.scala:29)
    at com.dt.spark.WordCount.main(WordCount.scala)
16/01/26 08:23:42 INFO SecurityManager: Changing view acls to: vivi
16/01/26 08:23:42 INFO SecurityManager: Changing modify acls to: vivi
16/01/26 08:23:42 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(vivi); users with modify permissions: Set(vivi)
16/01/26 08:23:43 INFO Utils: Successfully started service 'sparkDriver' on port 54663.
16/01/26 08:23:43 INFO Slf4jLogger: Slf4jLogger started
16/01/26 08:23:43 INFO Remoting: Starting remoting
16/01/26 08:23:43 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.100.102:54676]
16/01/26 08:23:43 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 54676.
16/01/26 08:23:43 INFO SparkEnv: Registering MapOutputTracker
16/01/26 08:23:43 INFO SparkEnv: Registering BlockManagerMaster
16/01/26 08:23:43 INFO DiskBlockManager: Created local directory at C:\Users\vivi\AppData\Local\Temp\blockmgr-5f59f3c2-3b87-49c5-a1ae-e21847aac44b
16/01/26 08:23:43 INFO MemoryStore: MemoryStore started with capacity 1813.7 MB
16/01/26 08:23:43 INFO SparkEnv: Registering OutputCommitCoordinator
16/01/26 08:23:43 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/01/26 08:23:43 INFO SparkUI: Started SparkUI at http://192.168.100.102:4040
16/01/26 08:23:43 INFO Executor: Starting executor ID driver on host localhost
16/01/26 08:23:43 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 54683.
16/01/26 08:23:43 INFO NettyBlockTransferService: Server created on 54683
16/01/26 08:23:43 INFO BlockManagerMaster: Trying to register BlockManager
16/01/26 08:23:43 INFO BlockManagerMasterEndpoint: Registering block manager localhost:54683 with 1813.7 MB RAM, BlockManagerId(driver, localhost, 54683)
16/01/26 08:23:43 INFO BlockManagerMaster: Registered BlockManager
16/01/26 08:23:46 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 153.6 KB, free 153.6 KB)
16/01/26 08:23:46 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 13.9 KB, free 167.6 KB)
16/01/26 08:23:46 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:54683 (size: 13.9 KB, free: 1813.7 MB)
16/01/26 08:23:46 INFO SparkContext: Created broadcast 0 from textFile at WordCount.scala:37
16/01/26 08:23:47 WARN : Your hostname, vivi-PC resolves to a loopback/non-reachable address: fe80:0:0:0:5937:95c4:86da:2f43%30, but we couldn't find any external IP address!
16/01/26 08:23:48 INFO FileInputFormat: Total input paths to process : 1
16/01/26 08:23:48 INFO SparkContext: Starting job: foreach at WordCount.scala:56
16/01/26 08:23:48 INFO DAGScheduler: Registering RDD 3 (map at WordCount.scala:48)
16/01/26 08:23:48 INFO DAGScheduler: Got job 0 (foreach at WordCount.scala:56) with 1 output partitions
16/01/26 08:23:48 INFO DAGScheduler: Final stage: ResultStage 1 (foreach at WordCount.scala:56)
16/01/26 08:23:48 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
16/01/26 08:23:48 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 0)
16/01/26 08:23:48 INFO DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCount.scala:48), which has no missing parents
16/01/26 08:23:48 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.0 KB, free 171.6 KB)
16/01/26 08:23:48 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.3 KB, free 173.9 KB)
16/01/26 08:23:48 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:54683 (size: 2.3 KB, free: 1813.7 MB)
16/01/26 08:23:48 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
16/01/26 08:23:48 INFO DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCount.scala:48)
16/01/26 08:23:48 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
16/01/26 08:23:48 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,PROCESS_LOCAL, 2119 bytes)
16/01/26 08:23:48 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
16/01/26 08:23:48 INFO HadoopRDD: Input split: file:/D:/testspark/README.md:0+3359
16/01/26 08:23:48 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
16/01/26 08:23:48 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
16/01/26 08:23:48 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
16/01/26 08:23:48 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
16/01/26 08:23:48 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
16/01/26 08:23:48 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2253 bytes result sent to driver
16/01/26 08:23:48 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 177 ms on localhost (1/1)
16/01/26 08:23:48 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
16/01/26 08:23:48 INFO DAGScheduler: ShuffleMapStage 0 (map at WordCount.scala:48) finished in 0.186 s
16/01/26 08:23:48 INFO DAGScheduler: looking for newly runnable stages
16/01/26 08:23:48 INFO DAGScheduler: running: Set()
16/01/26 08:23:48 INFO DAGScheduler: waiting: Set(ResultStage 1)
16/01/26 08:23:48 INFO DAGScheduler: failed: Set()
16/01/26 08:23:48 INFO DAGScheduler: Submitting ResultStage 1 (ShuffledRDD[4] at reduceByKey at WordCount.scala:54), which has no missing parents
16/01/26 08:23:48 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.5 KB, free 176.4 KB)
16/01/26 08:23:48 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1581.0 B, free 177.9 KB)
16/01/26 08:23:48 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:54683 (size: 1581.0 B, free: 1813.7 MB)
16/01/26 08:23:48 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1006
16/01/26 08:23:48 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (ShuffledRDD[4] at reduceByKey at WordCount.scala:54)
16/01/26 08:23:48 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
16/01/26 08:23:48 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, partition 0,NODE_LOCAL, 1894 bytes)
16/01/26 08:23:48 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
16/01/26 08:23:48 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/01/26 08:23:48 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 2 ms
package:1
For:2
Programs:1
processing.:1
Because:1
The:1
cluster.:1
its:1
[run:1
APIs:1
have:1
Try:1
computation:1
through:1
several:1
This:2
graph:1
Hive:2
storage:1
["Specifying:1
To:2
page](http://spark.apache.org/documentation.html):1
Once:1
"yarn":1
prefer:1
SparkPi:2
engine:1
version:1
file:1
documentation,:1
processing,:1
the:21
are:1
systems.:1
params:1
not:1
different:1
refer:2
Interactive:2
R,:1
given.:1
if:4
build:3
when:1
be:2
Tests:1
Apache:1
./bin/run-example:2
programs,:1
including:3
Spark.:1
package.:1
1000).count():1
Versions:1
HDFS:1
Data.:1
>>>:1
programming:1
Testing:1
module,:1
Streaming:1
environment:1
run::1
clean:1
1000::2
rich:1
GraphX:1
Please:3
is:6
run:7
URL,:1
threads.:1
same:1
MASTER=spark://host:7077:1
on:5
built:1
against:1
[Apache:1
tests:2
examples:2
at:2
optimized:1
usage:1
using:2
graphs:1
talk:1
Shell:2
class:2
abbreviated:1
directory.:1
README:1
computing:1
overview:1
`examples`:2
example::1
##:8
N:1
set:2
use:3
Hadoop-supported:1
tests](https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools).:1
running:1
find:1
contains:1
project:1
Pi:1
need:1
or:3
Big:1
Java,:1
high-level:1
uses:1
<class>:1
Hadoop,:2
available:1
requires:1
(You:1
see:1
Documentation:1
of:5
tools:1
using::1
cluster:2
must:1
supports:2
built,:1
system:1
build/mvn:1
Hadoop:3
this:1
Version"](http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version):1
particular:2
Python:2
Spark:13
general:2
YARN,:1
pre-built:1
[Configuration:1
locally:2
library:1
A:1
locally.:1
sc.parallelize(1:1
only:1
Configuration:1
following:2
basic:1
#:1
changed:1
More:1
which:2
learning,:1
first:1
./bin/pyspark:1
also:4
should:2
for:11
[params]`.:1
documentation:3
[project:2
mesos://:1
Maven](http://maven.apache.org/).:1
setup:1
<http://spark.apache.org/>:1
latest:1
your:1
MASTER:1
example:3
scala>:1
DataFrames,:1
provides:1
configure:1
distributions.:1
can:6
About:1
instructions.:1
do:2
easiest:1
no:1
how:2
`./bin/run-example:1
Note:1
individual:1
spark://:1
It:2
Scala:2
Alternatively,:1
an:3
variable:1
submit:1
machine:1
thread,:1
them,:1
detailed:2
stream:1
And:1
distribution:1
return:2
Thriftserver:1
./bin/spark-shell:1
"local":1
start:1
You:3
Spark](#building-spark).:1
one:2
help:1
with:3
print:1
Spark"](http://spark.apache.org/docs/latest/building-spark.html).:1
data:1
wiki](https://cwiki.apache.org/confluence/display/SPARK).:1
in:5
-DskipTests:1
downloaded:1
versions:1
online:1
Guide](http://spark.apache.org/docs/latest/configuration.html):1
comes:1
[building:1
Python,:2
Many:1
building:2
Running:1
from:1
way:1
Online:1
site,:1
other:1
Example:1
analysis.:1
sc.parallelize(range(1000)).count():1
you:4
runs.:1
Building:1
higher-level:1
protocols:1
guidance:2
a:8
guide,:1
name:1
fast:1
SQL:2
will:1
instance::1
to:14
core:1
:67
web:1
"local[N]":1
programs:2
package.):1
that:2
MLlib:1
["Building:1
shell::2
Scala,:1
and:10
command,:2
./dev/run-tests:1
sample:1
16/01/26 08:23:48 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 1165 bytes result sent to driver
16/01/26 08:23:48 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 61 ms on localhost (1/1)
16/01/26 08:23:48 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
16/01/26 08:23:48 INFO DAGScheduler: ResultStage 1 (foreach at WordCount.scala:56) finished in 0.061 s
16/01/26 08:23:48 INFO DAGScheduler: Job 0 finished: foreach at WordCount.scala:56, took 0.328012 s
16/01/26 08:23:48 INFO SparkUI: Stopped Spark web UI at http://192.168.100.102:4040
16/01/26 08:23:48 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/01/26 08:23:48 INFO MemoryStore: MemoryStore cleared
16/01/26 08:23:48 INFO BlockManager: BlockManager stopped
16/01/26 08:23:48 INFO BlockManagerMaster: BlockManagerMaster stopped
16/01/26 08:23:48 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/01/26 08:23:48 INFO SparkContext: Successfully stopped SparkContext
16/01/26 08:23:48 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/01/26 08:23:48 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/01/26 08:23:48 INFO ShutdownHookManager: Shutdown hook called
16/01/26 08:23:48 INFO ShutdownHookManager: Deleting directory C:\Users\vivi\AppData\Local\Temp\spark-56f9ed0a-5671-449a-955a-041c63569ff2

说明:上面程序运行错误,是加载hadoop的配置,因为运行在本地,是找不到的,但不影响测试。
  • 大小: 9.5 KB
  • 大小: 28.4 KB
  • 大小: 60.3 KB
  • 大小: 92.5 KB
  • 大小: 48.5 KB
  • 大小: 108.6 KB
  • 大小: 18.1 KB
  • 大小: 27.8 KB
  • 大小: 33.9 KB
  • 大小: 41.4 KB
  • 大小: 82.3 KB
分享到:
评论
1 楼 bo_hai 2016-06-12  
target jvm版本也要选择正确。不能选择太高。2.10对应是JVM1.7

相关推荐

    基于Eclipse的Hadoop应用开发环境的配置

    9. **运行Hadoop程序**:在Eclipse中,右键项目选择Run As -&gt; Run Configurations,创建一个新的Hadoop Job配置。配置包括输入数据路径、输出数据路径、主类等信息。点击Run,Eclipse会自动将程序打包并提交到Hadoop...

    Spark源代码在Eclipse3.5.2中的部署、编译、运行.doc

    在本文中,我们将详细探讨如何在Eclipse 3.5.2中部署、编译和运行Spark源代码。首先,我们需要准备必要的软件...完成这些步骤后,你将能够成功地在本地环境中运行Spark,这对于学习、调试或开发Spark功能是非常有用的。

    spark local下 WordCount运行示例

    1. **创建SparkConf对象**:这是设置Spark应用程序配置的地方,包括应用程序名、运行模式等。在本地模式下,我们使用`SparkConf().setMaster("local")`。 2. **创建SparkContext对象**:SparkConf配置后,通过`new ...

    Hadoop平台完整兼容组件(包括VM,Redhat系统镜像,jdk,hadoop,HBase,eclipse,spark等)

    通过Eclipse进行开发,利用Hadoop处理数据,通过Spark加速计算,HBase用于实时数据存储,ZooKeeper保证集群的稳定运行,而Hive则提供了更友好的数据查询方式。这样的环境对于理解Hadoop生态系统的运作机制以及开发...

    sparkscala开发依赖包

    5. 编写和运行代码:使用Scala编写Spark程序,然后通过ECLIPSE的运行配置来启动Spark集群并提交作业。 总之,Spark Scala开发依赖包对于在ECLIPSE中进行大数据处理项目至关重要。正确配置这些依赖将使开发者能够在...

    老汤spark开发.zip

    4. **安装Scala**: 虽然Spark可以不依赖Scala运行,但Spark的API是基于Scala设计的,所以为了开发Spark程序,通常需要安装Scala环境。同样地,Scala的bin目录也需要添加到环境变量PATH中。 5. **集成开发环境(IDE)*...

    win7下Eclipse开发Hadoop应用程序环境搭建

    在Windows 7操作系统中,使用Eclipse开发Hadoop应用程序的过程涉及多个步骤,涵盖了从环境配置到实际编程的各个层面。以下是对这个主题的详细讲解: 首先,我们需要了解Hadoop和Eclipse的基础。Hadoop是一个开源的...

    eclipse hadoop2.7.1 plugin 配置

    如果你在本地运行Hadoop,地址可能是`localhost:8088`和`localhost:9000`。 4. **创建Hadoop项目**:现在,你可以创建一个新的Map/Reduce项目。在`File` &gt; `New` &gt; `Project` &gt; `Map/Reduce Project`,然后按照向导...

    Linux环境下Hadoop搭建与Eclipse配置

    在Linux环境下搭建Hadoop并配置Eclipse开发环境是大数据处理工作中的重要步骤。Hadoop是一个开源的分布式计算框架,主要用于处理和存储大规模数据。而Eclipse是一款强大的集成开发环境(IDE),通过特定的插件可以...

    Hadoop集群搭建部署与MapReduce程序关键点个性化开发.doc

    在Eclipse中直接运行MapReduce程序,可以进行快速的本地测试和调试,减少了实际在集群上运行的时间。 任务3是对开发过程的总结和反思,通常包括遇到的问题、解决策略以及优化建议。在实践中,可能需要根据硬件资源...

    java+hadopp+scala+spark配置win10版

    6. **运行和测试**:配置完成后,你可以通过编写简单的Java或Scala程序,利用Hadoop和Spark处理本地文件或模拟集群进行测试。例如,使用WordCount示例来验证配置是否正确。 这个压缩包中的Word文档可能包含了配置...

    eclipse2019.zip

    这通常需要安装Remote System Explorer (RSE) 插件,它允许开发者在本地Eclipse环境中查看、编辑和管理远程文件系统,实现无缝的远程开发体验。 总之,Eclipse 2019针对Linux平台的版本为Java开发者提供了强大且...

    大数据云计算技术系列 hadoop搭建与eclipse开发环境设置-已验证通过(共13页).rar

    5. 运行和调试:使用Eclipse的Run或Debug功能,可以直接在本地运行MapReduce程序,或者连接到远程Hadoop集群进行测试。 此外,对于更高效的开发,可以学习使用Hadoop的高级特性,如Pig、Hive、Spark等工具,它们...

    精品课程推荐 大数据与云计算教程课件 优质大数据课程 04.MapReduce Eclipse开发插件(共20页).pptx

    运行MapReduce程序时,必须提供输入和输出路径,使用'hdfs://主机:端口/路径'格式,确保程序访问的是HDFS而不是本地文件系统。同时,确保在项目的Build Path中包含所有必要的Hadoop相关JAR包,特别是与HDFS相关的包...

    openfire与spark环境搭建教程

    总的来说,搭建Openfire与Spark环境需要对Eclipse项目管理和Java开发有一定的了解,同时对即时通讯系统的运行机制要有基本的认识。虽然过程可能稍显繁琐,但只要按照上述步骤操作,就能成功建立一个本地即时通讯环境...

    spark-3.2.4-bin-hadoop3.2-scala2.13 安装包

    9. **开发和调试**: 对于开发者,IDEA、IntelliJ IDEA和Eclipse等集成开发环境有专门的Spark插件,便于代码编写和调试。此外,使用`--master local[*]`可以在本地模式下快速测试代码,而无需连接到集群。 10. **...

    spark安装

    本文详细介绍了如何在本地环境中搭建Spark开发环境,包括JDK、Scala、Spark以及构建工具Maven的安装配置。通过一个简单的WordCount示例验证了环境搭建的正确性。这些步骤适用于初学者快速上手Spark开发,同时也为更...

    Eclipse Hadoop2.7插件

    3. **运行/调试支持**:可以直接在Eclipse内提交MapReduce任务进行运行或调试,无需离开IDE,这包括本地模式和远程集群模式。 4. **资源管理**:通过插件可以方便地查看和管理Hadoop集群的资源状态,包括节点信息、...

    安装hadoop spark

    - 测试程序是否能在本地或集群环境中正常运行。 #### 使用Spark的功能 1. **Spark Core**: - RDD(弹性分布式数据集):是Spark中最基本的数据抽象,可以用来执行各种并行操作。 - Transformation和Action:...

    IM spark源代码部署及编译

    构建成功后,需要配置Run/Debug配置来运行Spark应用程序。 以上步骤完成后,就可以运行Spark项目并看到Spark的启动界面,从而完成整个Spark源代码的部署和编译过程。这个过程不仅涉及到MyEclipse和Ant工具的使用,...

Global site tag (gtag.js) - Google Analytics