- 浏览: 1097394 次
- 性别:
- 来自: 北京
-
文章分类
- 全部博客 (379)
- eclipse (6)
- java基础 (21)
- opencms (4)
- tomcat (10)
- kettle (13)
- spring (7)
- 生活点滴 (3)
- linux (61)
- database (1)
- php (4)
- mac (3)
- mysql (37)
- maven (5)
- Asterisk (2)
- android (4)
- birt (2)
- hadoop (52)
- pentaho (46)
- distributed (2)
- Storm (2)
- 数据挖掘 (1)
- cassandra (0)
- spark (0)
- greenplum (3)
- R (1)
- liferay (3)
- 深度学习 (2)
- 数学 (1)
- Docker (3)
- Rancher (2)
- html (1)
- oracle (0)
- 交易 (0)
- Davinci (0)
- 大模型 (0)
最新评论
-
gujunge:
劝退。不支持两步验证登录、复制Session无效
Linux下SecureCRT的替代品:PAC Manager -
huangtianleyuan:
各位大神,有没有集成好的,请不吝赐教 qq:375249222 ...
在Web中集成Kettle -
梦行Monxin商城系统:
Dapper,大规模分布式系统的跟踪系统 -
hunter123456:
您好,请问一下,pentaho5.X 以上的版本,在服务器上建 ...
pentaho 4.8 添加 kettle 文件资源库的支持 -
hunter123456:
你好!pentaho5上 cas 单点登录,除了 修改appl ...
作为一个BI开发人员,我要发布报表,元数据,OLAP的模型到BI服务器
使用SBT安装完成Spark后,可以运行示例,但是尝试运行spark-shell就会报错:
D:\Scala\spark\bin\spark-shell.cmd
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/D:/Scala/spark/assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-hadoop1.0.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/D:/Scala/spark/tools/target/scala-2.10/spark-tools-assembly-0.9.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
14/04/03 20:40:43 INFO HttpServer: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
14/04/03 20:40:43 INFO HttpServer: Starting HTTP Server
Failed to initialize compiler: object scala.runtime in compiler mirror not found.
** Note that as of 2.8 scala does not assume use of the java classpath.
** For the old behavior pass -usejavacp to scala, or if using a Settings
** object programatically, settings.usejavacp.value = true.
14/04/03 20:40:44 WARN SparkILoop$SparkILoopInterpreter: Warning: compiler accessed before init set up.  Assuming no postInit code.
Failed to initialize compiler: object scala.runtime in compiler mirror not found.
** Note that as of 2.8 scala does not assume use of the java classpath.
** For the old behavior pass -usejavacp to scala, or if using a Settings
** object programatically, settings.usejavacp.value = true.
Failed to initialize compiler: object scala.runtime in compiler mirror not found.
        at scala.Predef$.assert(Predef.scala:179)
        at org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:197)
        at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:919)
        at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:876)
        at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:876)
        at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:876)
        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:968)
        at org.apache.spark.repl.Main$.main(Main.scala:31)
        at org.apache.spark.repl.Main.main(Main.scala)
Google之还是不求解。只是在SBT的网站上看到Q&A里面有个问题提到了:http://www.scala-sbt.org/release/docs/faq#how-do-i-use-the-scala-interpreter-in-my-code。这里说代码中怎么修改设置。显然不适合我。
继续求解。注意到错误提示是在2.8以后才有的,原因是有一个关于编译器解释权Classpath的提议被接受了:Default compiler/interpreter classpath in a managed environment。
继续在Google中找,有一篇论文吸引了我的注意:Object Scala Found。里面终于找到一个办法:
“
However, a working command can be recovered, like so:
$ jrunscript -Djava.class.path=scala-library.jar -Dscala.usejavacp=true -classpath scala-compiler.jar -l scala
”
于是修改一下\bin\spark-class2.cmd:
rem Set JAVA_OPTS to be able to load native libraries and to set heap size
set JAVA_OPTS=%OUR_JAVA_OPTS% -Djava.library.path=%SPARK_LIBRARY_PATH% -Dscala.usejavacp=true -Xms%SPARK_MEM% -Xmx%SPARK_MEM%
rem Attention: when changing the way the JAVA_OPTS are assembled, the change must be reflected in ExecutorRunner.scala!
标红的部分即是心添加的一个参数。再次运行\bin\spark-shell.cmd:
D:>D:\Scala\spark\bin\spark-shell.cmd
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/D:/Scala/spark/assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-hadoop1.0.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/D:/Scala/spark/tools/target/scala-2.10/spark-tools-assembly-0.9.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
14/04/03 22:18:41 INFO HttpServer: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
14/04/03 22:18:41 INFO HttpServer: Starting HTTP Server
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 0.9.0
      /_/
Using Scala version 2.10.3 (Java HotSpot(TM) Client VM, Java 1.6.0_10)
Type in expressions to have them evaluated.
Type :help for more information.
14/04/03 22:19:12 INFO Slf4jLogger: Slf4jLogger started
14/04/03 22:19:13 INFO Remoting: Starting remoting
14/04/03 22:19:16 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@Choco-PC:5960]
14/04/03 22:19:16 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@Choco-PC:5960]
14/04/03 22:19:16 INFO SparkEnv: Registering BlockManagerMaster
14/04/03 22:19:17 INFO DiskBlockManager: Created local directory at C:\Users\Choco\AppData\Local\Temp\spark-local-20140403221917-7172
14/04/03 22:19:17 INFO MemoryStore: MemoryStore started with capacity 304.8 MB.
14/04/03 22:19:18 INFO ConnectionManager: Bound socket to port 5963 with id = ConnectionManagerId(Choco-PC,5963)
14/04/03 22:19:18 INFO BlockManagerMaster: Trying to register BlockManager
14/04/03 22:19:18 INFO BlockManagerMasterActor$BlockManagerInfo: Registering block manager Choco-PC:5963 with 304.8 MB RAM
14/04/03 22:19:18 INFO BlockManagerMaster: Registered BlockManager
14/04/03 22:19:18 INFO HttpServer: Starting HTTP Server
14/04/03 22:19:18 INFO HttpBroadcast: Broadcast server started at http://192.168.1.100:5964
14/04/03 22:19:18 INFO SparkEnv: Registering MapOutputTracker
14/04/03 22:19:18 INFO HttpFileServer: HTTP File server directory is C:\Users\Choco\AppData\Local\Temp\spark-e122cfe9-2d62-4a47-920c-96b54e4658f6
14/04/03 22:19:18 INFO HttpServer: Starting HTTP Server
14/04/03 22:19:22 INFO SparkUI: Started Spark Web UI at http://Choco-PC:4040
14/04/03 22:19:22 INFO Executor: Using REPL class URI: http://192.168.1.100:5947
Created spark context..
Spark context available as sc.
scala> :quit
Stopping spark context.
14/04/03 23:05:21 INFO MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!
14/04/03 23:05:21 INFO ConnectionManager: Selector thread was interrupted!
14/04/03 23:05:21 INFO ConnectionManager: ConnectionManager stopped
14/04/03 23:05:21 INFO MemoryStore: MemoryStore cleared
14/04/03 23:05:21 INFO BlockManager: BlockManager stopped
14/04/03 23:05:21 INFO BlockManagerMasterActor: Stopping BlockManagerMaster
14/04/03 23:05:21 INFO BlockManagerMaster: BlockManagerMaster stopped
14/04/03 23:05:21 INFO SparkContext: Successfully stopped SparkContext
14/04/03 23:05:21 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
14/04/03 23:05:21 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
Good。浏览器打开http://Choco-PC:4040,就可以看到Spark的状态、环境、执行者等信息了。
这个Fix可能只是适用与我的情况。如果还有问题可以再找找相关的资料。
期间还碰到不能找到文件的错误。最后发现是JAVA_HOME设置没有对。如果你碰到问题了,可以打开脚本的回显,然后找找原因。
D:\Scala\spark\bin\spark-shell.cmd
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/D:/Scala/spark/assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-hadoop1.0.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/D:/Scala/spark/tools/target/scala-2.10/spark-tools-assembly-0.9.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
14/04/03 20:40:43 INFO HttpServer: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
14/04/03 20:40:43 INFO HttpServer: Starting HTTP Server
Failed to initialize compiler: object scala.runtime in compiler mirror not found.
** Note that as of 2.8 scala does not assume use of the java classpath.
** For the old behavior pass -usejavacp to scala, or if using a Settings
** object programatically, settings.usejavacp.value = true.
14/04/03 20:40:44 WARN SparkILoop$SparkILoopInterpreter: Warning: compiler accessed before init set up.  Assuming no postInit code.
Failed to initialize compiler: object scala.runtime in compiler mirror not found.
** Note that as of 2.8 scala does not assume use of the java classpath.
** For the old behavior pass -usejavacp to scala, or if using a Settings
** object programatically, settings.usejavacp.value = true.
Failed to initialize compiler: object scala.runtime in compiler mirror not found.
        at scala.Predef$.assert(Predef.scala:179)
        at org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:197)
        at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:919)
        at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:876)
        at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:876)
        at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:876)
        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:968)
        at org.apache.spark.repl.Main$.main(Main.scala:31)
        at org.apache.spark.repl.Main.main(Main.scala)
Google之还是不求解。只是在SBT的网站上看到Q&A里面有个问题提到了:http://www.scala-sbt.org/release/docs/faq#how-do-i-use-the-scala-interpreter-in-my-code。这里说代码中怎么修改设置。显然不适合我。
继续求解。注意到错误提示是在2.8以后才有的,原因是有一个关于编译器解释权Classpath的提议被接受了:Default compiler/interpreter classpath in a managed environment。
继续在Google中找,有一篇论文吸引了我的注意:Object Scala Found。里面终于找到一个办法:
“
However, a working command can be recovered, like so:
$ jrunscript -Djava.class.path=scala-library.jar -Dscala.usejavacp=true -classpath scala-compiler.jar -l scala
”
于是修改一下\bin\spark-class2.cmd:
rem Set JAVA_OPTS to be able to load native libraries and to set heap size
set JAVA_OPTS=%OUR_JAVA_OPTS% -Djava.library.path=%SPARK_LIBRARY_PATH% -Dscala.usejavacp=true -Xms%SPARK_MEM% -Xmx%SPARK_MEM%
rem Attention: when changing the way the JAVA_OPTS are assembled, the change must be reflected in ExecutorRunner.scala!
标红的部分即是心添加的一个参数。再次运行\bin\spark-shell.cmd:
D:>D:\Scala\spark\bin\spark-shell.cmd
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/D:/Scala/spark/assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-hadoop1.0.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/D:/Scala/spark/tools/target/scala-2.10/spark-tools-assembly-0.9.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
14/04/03 22:18:41 INFO HttpServer: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
14/04/03 22:18:41 INFO HttpServer: Starting HTTP Server
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 0.9.0
      /_/
Using Scala version 2.10.3 (Java HotSpot(TM) Client VM, Java 1.6.0_10)
Type in expressions to have them evaluated.
Type :help for more information.
14/04/03 22:19:12 INFO Slf4jLogger: Slf4jLogger started
14/04/03 22:19:13 INFO Remoting: Starting remoting
14/04/03 22:19:16 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@Choco-PC:5960]
14/04/03 22:19:16 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@Choco-PC:5960]
14/04/03 22:19:16 INFO SparkEnv: Registering BlockManagerMaster
14/04/03 22:19:17 INFO DiskBlockManager: Created local directory at C:\Users\Choco\AppData\Local\Temp\spark-local-20140403221917-7172
14/04/03 22:19:17 INFO MemoryStore: MemoryStore started with capacity 304.8 MB.
14/04/03 22:19:18 INFO ConnectionManager: Bound socket to port 5963 with id = ConnectionManagerId(Choco-PC,5963)
14/04/03 22:19:18 INFO BlockManagerMaster: Trying to register BlockManager
14/04/03 22:19:18 INFO BlockManagerMasterActor$BlockManagerInfo: Registering block manager Choco-PC:5963 with 304.8 MB RAM
14/04/03 22:19:18 INFO BlockManagerMaster: Registered BlockManager
14/04/03 22:19:18 INFO HttpServer: Starting HTTP Server
14/04/03 22:19:18 INFO HttpBroadcast: Broadcast server started at http://192.168.1.100:5964
14/04/03 22:19:18 INFO SparkEnv: Registering MapOutputTracker
14/04/03 22:19:18 INFO HttpFileServer: HTTP File server directory is C:\Users\Choco\AppData\Local\Temp\spark-e122cfe9-2d62-4a47-920c-96b54e4658f6
14/04/03 22:19:18 INFO HttpServer: Starting HTTP Server
14/04/03 22:19:22 INFO SparkUI: Started Spark Web UI at http://Choco-PC:4040
14/04/03 22:19:22 INFO Executor: Using REPL class URI: http://192.168.1.100:5947
Created spark context..
Spark context available as sc.
scala> :quit
Stopping spark context.
14/04/03 23:05:21 INFO MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!
14/04/03 23:05:21 INFO ConnectionManager: Selector thread was interrupted!
14/04/03 23:05:21 INFO ConnectionManager: ConnectionManager stopped
14/04/03 23:05:21 INFO MemoryStore: MemoryStore cleared
14/04/03 23:05:21 INFO BlockManager: BlockManager stopped
14/04/03 23:05:21 INFO BlockManagerMasterActor: Stopping BlockManagerMaster
14/04/03 23:05:21 INFO BlockManagerMaster: BlockManagerMaster stopped
14/04/03 23:05:21 INFO SparkContext: Successfully stopped SparkContext
14/04/03 23:05:21 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
14/04/03 23:05:21 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
Good。浏览器打开http://Choco-PC:4040,就可以看到Spark的状态、环境、执行者等信息了。
这个Fix可能只是适用与我的情况。如果还有问题可以再找找相关的资料。
期间还碰到不能找到文件的错误。最后发现是JAVA_HOME设置没有对。如果你碰到问题了,可以打开脚本的回显,然后找找原因。
发表评论
-
Apache Drill 入手
2016-10-26 16:53 0wget http://apache.mirrors.hoo ... -
R rmr2
2016-10-08 10:51 0参考:http://blog.fens.me/rhadoop ... -
WebHDFS vs HttpFS GateWay
2016-03-08 11:07 3752基于hadoop2.7.1版本 一、简介 ... -
RHadoop安装和使用
2016-02-26 14:26 1660环境 hortonworks 2.3版本,ambari2. ... -
hbase的sql工具phoenix整合到pentaho中
2015-12-01 10:04 3767原创文章,转载请注 ... -
Hive ROW_NUMBER,RANK(),DENSE_RANK()
2015-11-02 19:50 0准备数据 浙江,杭州,300 浙江,宁波,150 浙江 ... -
HBase基础之常用过滤器hbase shell操作
2015-10-22 10:36 3391最近需要对hbase进行性能优化,苦于对hbase的sca ... -
PDI(Kettle)5.x配置hadoop
2015-06-18 16:00 0配置: 需要配置hdp.version的版本: / ... -
flume 例子
2015-05-08 17:38 0# Flume agent config agent1 ... -
在Kettle(PDI)跑Apache Spark作业
2015-04-22 18:27 9226原创文章,转载请注明出处:http://qq8560965 ... -
升级Ambari到2.0,hadoop版本升级
2015-04-21 15:28 4994原创文章,转载请注明出处:http://qq8560965 ... -
Error oozie
2015-04-14 16:07 01 Check DB schema does not ex ... -
HDFS block丢失过多进入安全模式(safe mode)的解决方法
2015-04-14 15:22 17441HDFS block丢失过多进入安全模式(Safe mode ... -
ambari 重新安装节点上的组件
2015-04-10 15:10 6665原创文章,转载请注明出处:http://qq8560965 ... -
Hadoop迁移Namenode
2015-04-03 17:50 1205一.前提及方案 操作版本:社区版 Hadoop-1. ... -
解决Eclipse中运行WordCount出现 java.lang.ClassNotFoundException: org.apache.hadoop.exam
2015-03-20 15:55 2066. import java.io.File; impo ... -
windows下Eclipse开发MapReduce
2015-03-20 11:46 0今日,多个同事问我,MapReduce如何在windows ... -
远程调试hadoop2以及错误处理方法
2015-03-19 15:20 2103了解程序运行过程,除了一行行代码的扫射源代码。更快捷的方式是 ... -
Flume 1.5.0简单部署
2015-03-05 12:57 2745============================= ... -
Phoenix的安装与使用
2015-03-04 10:48 2150hadoop与hbase集群安装好后 在每个hba ...
相关推荐
编译完成后,你可以通过运行以下命令启动 Spark Scala shell 来测试编译结果: ``` ./bin/spark-shell --master local[4] ``` 这里 `local[4]` 表示在本地模式下启动 Spark,使用 4 个核心。 关键知识点: 1. **...
这个版本包含Scala编译器、Scala库以及相关的工具,使得开发者可以在Windows环境下编写、编译和运行Scala代码。 要开始使用这个压缩包,首先需要解压缩文件。Windows系统内置了zip文件的处理功能,或者你可以使用第...
如果不指定变量的类型,Scala编译器会通过初始化表达式的类型来推断变量类型。不过,当需要明确类型时,可以这样做: ```scala var anotherVar: Int = 10 ``` 2. **Scala解释器与REPL** Scala提供了一个交互式...
解压后,你会得到一个名为"scala-2.11.12"的文件夹,其中包含了Scala编译器(scalac)、解释器(scala)和其他相关工具。为了在命令行中使用Scala,你需要将 Scala 的 bin 目录添加到系统的PATH环境变量中。这样,...
6. **运行Spark**:在命令行或IDE中,你可以启动Spark Shell或提交Spark应用程序。 了解和掌握这些组件,可以让你在大数据处理领域游刃有余,无论是进行数据存储、数据处理还是数据分析,都能够借助它们强大的功能...
同时,Spark的安装文件通常会包含`bin`目录,其中包含可执行脚本,用于启动Spark的各个组件,如`spark-shell`(Scala交互式环境)、`pyspark`(Python接口)和`spark-submit`(提交Spark作业)。可能还会有一个`conf...
这个压缩包包含了Scala编译器、运行时环境以及相关的库文件。解压过程通常使用`tar`命令,这是一个在Linux和Unix系统中广泛使用的工具。例如,你可以使用以下命令来解压: ```bash tar -zxvf scala-2.11.7.tgz ``` ...
1. **类型推断**:Scala编译器能够自动推断变量和表达式的类型,减少了显式类型的繁琐工作。 2. **模式匹配**:这是一种强大的语法结构,允许你根据值的结构进行解构和分支,简化了数据处理和分析。 3. **高阶函数...
在Spark开发中,Scala的强类型和面向对象特性让代码更易于维护,而函数式编程则鼓励使用不可变数据和高阶函数,这在处理大规模数据时尤其有利。Spark API大量使用了Scala的特性,如模式匹配和 Actors,使得开发...
首先,你需要安装Java Development Kit (JDK) 和 Scala编译器,因为Spark是用Scala编写的,且依赖于Java。确保JDK版本至少为8,Scala版本与Spark版本兼容。例如,如果你使用的是Spark 2.x,推荐使用Scala 2.11版本。...
2. 启动一个Spark Shell:`bin/spark-shell`,然后在Shell中执行一些简单的操作,比如创建一个RDD并进行操作,以确保Spark伪分布式环境正常工作。 至此,Spark伪分布式环境已经搭建完成。你可以在这个环境中编写和...
通过`scala`命令,你可以在交互式shell中尝试Scala,而`scalac`则用于编译Scala源代码成可执行的Java字节码。 总的来说,Scala-2.12.14是一个强大的编程环境,适合那些寻求高效、灵活和现代化开发工具的程序员。...
学习Spark实战,还需要掌握如何使用Spark Shell进行交互式编程,如何编写Scala、Java、Python或R语言的Spark应用程序,以及如何使用SparkSubmit提交作业。对于集群部署,了解standalone模式、YARN模式和Mesos模式的...
- 测试连接:通过`bin/spark-shell`进入Spark Shell,测试Spark是否正常工作。 6. **配置优化** - `conf/spark-defaults.conf`:设置默认配置,如executor内存、CPU核心数等。 - `conf/spark-env.sh`:根据集群...
- **启动**:使用 `bin/spark-sql --master local[2]` 启动 Spark SQL shell。 #### 五、查询执行计划 - **基本查询**:`EXPLAIN SELECT sal FROM emp ORDER BY sal;` - **物理计划**: ```plaintext ==...
2. **安装Scala编译器**: 配置Scala编译器环境,以便理解源码。 3. **IDE集成**: 在IDE中导入Spark源码项目,如使用IntelliJ IDEA的Scala插件。 4. **构建和运行源码**: 使用sbt构建工具,构建并运行Spark源码。 ##...
通过以上知识点的学习,你将能够熟练地在Spark SQL中进行数据处理,无论是从简单的数据转换,还是到复杂的分析任务,Spark SQL都能为你提供强大的支持。记住,实践是最好的老师,所以尝试在实际项目中应用这些概念,...
4. **测试Pyspark**: 打开终端,输入`pyspark`启动Pyspark的交互式Shell。如果一切正常,你应该看到一个Python提示符,可以开始编写和运行Spark程序了。 **Pyspark基本使用** 在Pyspark中,你可以创建`...