- 浏览: 2552002 次
- 性别:
- 来自: 成都
文章分类
最新评论
-
nation:
你好,在部署Mesos+Spark的运行环境时,出现一个现象, ...
Spark(4)Deal with Mesos -
sillycat:
AMAZON Relatedhttps://www.godad ...
AMAZON API Gateway(2)Client Side SSL with NGINX -
sillycat:
sudo usermod -aG docker ec2-use ...
Docker and VirtualBox(1)Set up Shared Disk for Virtual Box -
sillycat:
Every Half an Hour30 * * * * /u ...
Build Home NAS(3)Data Redundancy -
sillycat:
3 List the Cron Job I Have>c ...
Build Home NAS(3)Data Redundancy
Spark/Hadoop/Zeppelin Upgrade(2)
1 Install Hadoop 2.6.4
> wget http://mirrors.ibiblio.org/apache/hadoop/common/hadoop-2.6.4/hadoop-2.6.4-src.tar.gz
Fail on the annotation package
> mvn package -Pdist,native -DskipTests -Dtar
Maybe because the version of java, Cmake or other packages, so I will choose to directly download the binary of hadoop 2.6.4.
> wget http://mirror.nexcess.net/apache/hadoop/common/hadoop-2.6.4/hadoop-2.6.4.tar.gz
Configure and set up as the same as 2.7.2
> cat core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://ubuntu-master:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/opt/hadoop/temp</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
</configuration>
Edit hadoop-env.sh
export JAVA_HOME="/opt/jdk"
> cat hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>ubuntu-master:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/opt/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
> cat slaves
ubuntu-dev1
ubuntu-dev2
> cat yarn-site.xml
<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>ubuntu-master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>ubuntu-master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>ubuntu-master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>ubuntu-master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>ubuntu-master:8088</value>
</property>
</configuration>
> mkdir /opt/hadoop/temp
> mkdir -p /opt/hadoop/dfs/data
> mkdir -p /opt/hadoop/dfs/name
Same things on the ubuntu-dev1 and ubuntu-dev2
hadoop is done.
1 HDFS
cd /opt/hadoop
sbin/start-dfs.sh
http://ubuntu-master:50070/dfshealth.html#tab-overview
2 YARN
cd /opt/hadoop
sbin/start-yarn.sh
http://ubuntu-master:8088/cluster
2 Installation of Spark
Build the Spark with MAVEN
> build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.4 -Phive -DskipTests clean package
Build the Spark with SBT
> build/sbt -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.4 -Phive assembly
Here is the command to build the binary
> ./make-distribution.sh --name spark-1.6.1 --tgz -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.4 -Phive
Build Success. I get this binary file spark-1.6.1-bin-spark-1.6.1.tgz
Spark YARN Setting
On ubuntu-master
>cat conf/spark-env.sh
HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop
This command will start the shell
> MASTER=yarn-client bin/spark-shell
We can also use spark-submit to submit my job to the remote.
http://sillycat.iteye.com/blog/2103457
3 Zeppelin Installation
http://sillycat.iteye.com/blog/2286997
> git clone https://github.com/apache/incubator-zeppelin.git
> git checkout tags/v0.5.6
> mvn clean package -DskipTests -Pspark-1.6 -Dspark.version=1.6.1 -Phadoop-2.6 -Dhadoop.version=2.6.4
> mvn clean package -Pbuild-distr -DskipTests -Pspark-1.6 -Dspark.version=1.6.1 -Phadoop-2.6 -Dhadoop.version=2.6.4
Build success. The binary will be generate here. /home/carl/install/incubator-zeppelin/zeppelin-distribution/target
Unzip and Check the Configure
> cat zeppelin-env.sh
# export HADOOP_CONF_DIR
# yarn-site.xml is located in configuration directory in HADOOP_CONF_DIR.
export HADOOP_CONF_DIR="/opt/hadoop/etc/hadoop/"
# export SPARK_HOME
# (required) When it is defined, load it instead of Zeppelin embedded Spark libraries
export SPARK_HOME="/opt/spark"
. ${SPARK_HOME}/conf/spark-env.sh
# export ZEPPELIN_CLASSPATH="${SPARK_CLASSPATH}"
Start the Server
> bin/zeppelin-daemon.sh start
The visit console
http://ubuntu-master:8080/#/
Error Message:
ERROR [2016-04-01 13:58:49,540] ({qtp1232306490-35} NotebookServer.java[onMessage]:162) - Can't handle message
org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.cancel(RemoteInterpreter.java:248)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.cancel(LazyOpenInterpreter.java:99)
at org.apache.zeppelin.notebook.Paragraph.jobAbort(Paragraph.java:229)
at org.apache.zeppelin.scheduler.Job.abort(Job.java:232)
at org.apache.zeppelin.socket.NotebookServer.cancelParagraph(NotebookServer.java:695)
More Error Message in file less zeppelin-carl-ubuntu-master.out
16/04/01 14:10:40 WARN netty.NettyRpcEndpointRef: Error sending message [message = RemoveExecutor(1,Container killed by YARN for exceeding memory limits. 2.1 GB of 2.1 GB virtual memory used. Consider boosting spark.yarn.executor.memoryOverhead.)] in 1 attempts
org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in 120 seconds. This timeout is controlled by spark.rpc.askTimeout
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in 120 seconds. This timeout is controlled by spark.rpc.askTimeout
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
On the hadoop slaves in logging yarn-carl-nodemanager-ubuntu-dev2.log
2016-04-01 15:28:54,525 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 2229 for container-id container
_1459541332549_0002_02_000001: 124.6 MB of 1 GB physical memory used; 2.1 GB of 2.1 GB virtual memory used
Solution:
http://www.wdong.org/wordpress/blog/2015/01/08/spark-on-yarn-where-have-all-my-memory-gone/
http://stackoverflow.com/questions/21005643/container-is-running-beyond-memory-limits
This configuration in yarn-site.xml fixed the problem.
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
After we start the task in zeppelin, we can visit the spark context from this console
http://ubuntu-master:4040/
References:
http://sillycat.iteye.com/blog/2286997
1 Install Hadoop 2.6.4
> wget http://mirrors.ibiblio.org/apache/hadoop/common/hadoop-2.6.4/hadoop-2.6.4-src.tar.gz
Fail on the annotation package
> mvn package -Pdist,native -DskipTests -Dtar
Maybe because the version of java, Cmake or other packages, so I will choose to directly download the binary of hadoop 2.6.4.
> wget http://mirror.nexcess.net/apache/hadoop/common/hadoop-2.6.4/hadoop-2.6.4.tar.gz
Configure and set up as the same as 2.7.2
> cat core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://ubuntu-master:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/opt/hadoop/temp</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
</configuration>
Edit hadoop-env.sh
export JAVA_HOME="/opt/jdk"
> cat hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>ubuntu-master:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/opt/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
> cat slaves
ubuntu-dev1
ubuntu-dev2
> cat yarn-site.xml
<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>ubuntu-master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>ubuntu-master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>ubuntu-master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>ubuntu-master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>ubuntu-master:8088</value>
</property>
</configuration>
> mkdir /opt/hadoop/temp
> mkdir -p /opt/hadoop/dfs/data
> mkdir -p /opt/hadoop/dfs/name
Same things on the ubuntu-dev1 and ubuntu-dev2
hadoop is done.
1 HDFS
cd /opt/hadoop
sbin/start-dfs.sh
http://ubuntu-master:50070/dfshealth.html#tab-overview
2 YARN
cd /opt/hadoop
sbin/start-yarn.sh
http://ubuntu-master:8088/cluster
2 Installation of Spark
Build the Spark with MAVEN
> build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.4 -Phive -DskipTests clean package
Build the Spark with SBT
> build/sbt -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.4 -Phive assembly
Here is the command to build the binary
> ./make-distribution.sh --name spark-1.6.1 --tgz -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.4 -Phive
Build Success. I get this binary file spark-1.6.1-bin-spark-1.6.1.tgz
Spark YARN Setting
On ubuntu-master
>cat conf/spark-env.sh
HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop
This command will start the shell
> MASTER=yarn-client bin/spark-shell
We can also use spark-submit to submit my job to the remote.
http://sillycat.iteye.com/blog/2103457
3 Zeppelin Installation
http://sillycat.iteye.com/blog/2286997
> git clone https://github.com/apache/incubator-zeppelin.git
> git checkout tags/v0.5.6
> mvn clean package -DskipTests -Pspark-1.6 -Dspark.version=1.6.1 -Phadoop-2.6 -Dhadoop.version=2.6.4
> mvn clean package -Pbuild-distr -DskipTests -Pspark-1.6 -Dspark.version=1.6.1 -Phadoop-2.6 -Dhadoop.version=2.6.4
Build success. The binary will be generate here. /home/carl/install/incubator-zeppelin/zeppelin-distribution/target
Unzip and Check the Configure
> cat zeppelin-env.sh
# export HADOOP_CONF_DIR
# yarn-site.xml is located in configuration directory in HADOOP_CONF_DIR.
export HADOOP_CONF_DIR="/opt/hadoop/etc/hadoop/"
# export SPARK_HOME
# (required) When it is defined, load it instead of Zeppelin embedded Spark libraries
export SPARK_HOME="/opt/spark"
. ${SPARK_HOME}/conf/spark-env.sh
# export ZEPPELIN_CLASSPATH="${SPARK_CLASSPATH}"
Start the Server
> bin/zeppelin-daemon.sh start
The visit console
http://ubuntu-master:8080/#/
Error Message:
ERROR [2016-04-01 13:58:49,540] ({qtp1232306490-35} NotebookServer.java[onMessage]:162) - Can't handle message
org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.cancel(RemoteInterpreter.java:248)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.cancel(LazyOpenInterpreter.java:99)
at org.apache.zeppelin.notebook.Paragraph.jobAbort(Paragraph.java:229)
at org.apache.zeppelin.scheduler.Job.abort(Job.java:232)
at org.apache.zeppelin.socket.NotebookServer.cancelParagraph(NotebookServer.java:695)
More Error Message in file less zeppelin-carl-ubuntu-master.out
16/04/01 14:10:40 WARN netty.NettyRpcEndpointRef: Error sending message [message = RemoveExecutor(1,Container killed by YARN for exceeding memory limits. 2.1 GB of 2.1 GB virtual memory used. Consider boosting spark.yarn.executor.memoryOverhead.)] in 1 attempts
org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in 120 seconds. This timeout is controlled by spark.rpc.askTimeout
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in 120 seconds. This timeout is controlled by spark.rpc.askTimeout
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
On the hadoop slaves in logging yarn-carl-nodemanager-ubuntu-dev2.log
2016-04-01 15:28:54,525 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 2229 for container-id container
_1459541332549_0002_02_000001: 124.6 MB of 1 GB physical memory used; 2.1 GB of 2.1 GB virtual memory used
Solution:
http://www.wdong.org/wordpress/blog/2015/01/08/spark-on-yarn-where-have-all-my-memory-gone/
http://stackoverflow.com/questions/21005643/container-is-running-beyond-memory-limits
This configuration in yarn-site.xml fixed the problem.
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
After we start the task in zeppelin, we can visit the spark context from this console
http://ubuntu-master:4040/
References:
http://sillycat.iteye.com/blog/2286997
发表评论
-
Stop Update Here
2020-04-28 09:00 316I will stop update here, and mo ... -
NodeJS12 and Zlib
2020-04-01 07:44 476NodeJS12 and Zlib It works as ... -
Docker Swarm 2020(2)Docker Swarm and Portainer
2020-03-31 23:18 369Docker Swarm 2020(2)Docker Swar ... -
Docker Swarm 2020(1)Simply Install and Use Swarm
2020-03-31 07:58 370Docker Swarm 2020(1)Simply Inst ... -
Traefik 2020(1)Introduction and Installation
2020-03-29 13:52 337Traefik 2020(1)Introduction and ... -
Portainer 2020(4)Deploy Nginx and Others
2020-03-20 12:06 431Portainer 2020(4)Deploy Nginx a ... -
Private Registry 2020(1)No auth in registry Nginx AUTH for UI
2020-03-18 00:56 436Private Registry 2020(1)No auth ... -
Docker Compose 2020(1)Installation and Basic
2020-03-15 08:10 374Docker Compose 2020(1)Installat ... -
VPN Server 2020(2)Docker on CentOS in Ubuntu
2020-03-02 08:04 455VPN Server 2020(2)Docker on Cen ... -
Buffer in NodeJS 12 and NodeJS 8
2020-02-25 06:43 385Buffer in NodeJS 12 and NodeJS ... -
NodeJS ENV Similar to JENV and PyENV
2020-02-25 05:14 478NodeJS ENV Similar to JENV and ... -
Prometheus HA 2020(3)AlertManager Cluster
2020-02-24 01:47 423Prometheus HA 2020(3)AlertManag ... -
Serverless with NodeJS and TencentCloud 2020(5)CRON and Settings
2020-02-24 01:46 337Serverless with NodeJS and Tenc ... -
GraphQL 2019(3)Connect to MySQL
2020-02-24 01:48 248GraphQL 2019(3)Connect to MySQL ... -
GraphQL 2019(2)GraphQL and Deploy to Tencent Cloud
2020-02-24 01:48 451GraphQL 2019(2)GraphQL and Depl ... -
GraphQL 2019(1)Apollo Basic
2020-02-19 01:36 328GraphQL 2019(1)Apollo Basic Cl ... -
Serverless with NodeJS and TencentCloud 2020(4)Multiple Handlers and Running wit
2020-02-19 01:19 314Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(3)Build Tree and Traverse Tree
2020-02-19 01:19 319Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(2)Trigger SCF in SCF
2020-02-19 01:18 294Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(1)Running with Component
2020-02-19 01:17 312Serverless with NodeJS and Tenc ...
相关推荐
本地开发Spark/Hadoop报错“ERROR Shell: Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.” ...
在大数据处理领域,Spark和Hadoop是两个非常重要的框架,它们广泛用于数据处理、分析以及存储。当需要从云存储服务如Amazon S3(Simple Storage Service)读取或写入数据时,这两个框架需要额外的依赖包来实现与S3的...
所以需要重新编译Container-executor,这边提供重新编译好的,默认加载配置文件路径/etc/hadoop/container-executor.cfg 使用方法: 1 替换/$HADOOP_HOME/bin/下的container-executor 2 创建/etc/hadoop目录,并将...
Spark 1.3+ 介绍 本项目支持在Spark运行环境中与阿里云的基础服务OSS、ODPS、LogService、ONS等进行交互。 构建和安装 git clone https://github.com/aliyun/aliyun-emapreduce-datasources.git cd aliyun-...
Hadoop/etc/hadoop/slaves 的IP地址要变。 5个重要的配置文件逐个检查,IP地址需要变 2.配置文件确认无错误,看日志: 从logs日志中寻找slave的namenode为什么没有起来。 3.最后发现是hdfs中存在上次的数据,删掉...
sudo nano /home/hadoop/hadoop-2.6.4/etc/hadoop/mapred-site.xml ``` 配置 `mapreduce.framework.name` 为 `yarn`。 - **yarn-site.xml**: ```bash sudo nano /home/hadoop/hadoop-2.6.4/etc/hadoop/yarn-...
大数据面试题,大数据成神之路开启...Flink/Spark/Hadoop/Hbase/Hive... 已经更新100+篇~ 关注公众号~ 大数据成神之路目录 大数据开发基础篇 :skis: Java基础 :memo: NIO :open_book: 并发 :...
hadoop/etc/hadoop/6个文件 core-site.xml hadoop-env.sh hdfs-site.xml mapred-site.xml yarn-env.sh yarn-site.xml
Hadoop的框架最核心的设计就是:HDFS和MapReduce。HDFS为海量的数据提供了存储,则MapReduce为海量的数据提供了计算。Storm是一个分布式的、容错的实时计算系统。两者整合,优势互补。
2. **解压Hadoop** 使用`sudo tar xzf hadoop-0.20.2.tar.gz`命令解压缩Hadoop软件包。 3. **更改文件所有者** 执行`sudo chown -R dm:dm hadoop-0.20.2`,将解压后的Hadoop目录的所有权更改为之前创建的Hadoop...
1.安装 Hadoop-gpl-compression ...1.2 mv hadoop-gpl-...bin/hadoop jar /usr/local/hadoop-1.0.2/lib/hadoop-lzo-0.4.15.jar com.hadoop.compression.lzo.LzoIndexer /home/hadoop/project_hadoop/aa.html.lzo
Hadoop生态中的其他重要组件,如HBase(分布式数据库)、Hive(数据仓库工具)、Pig(数据流处理语言)和Spark(快速大数据处理引擎)也会有所提及。这些工具与Hadoop结合,提供了更高效、灵活的数据处理方案。例如...
hadoop安装与配置 hadoop安装与配置 Hadoop的安装与配置可以分成几个主要步骤: 1. 安装Java 2. 下载Hadoop 3. 配置Hadoop ...编辑/usr/local/hadoop/etc/hadoop/hadoop-env.sh,设置JAVA_HOME: export JAVA_H
sudo nano /opt/hadoop/etc/hadoop/core-site.xml ``` 添加如下配置: ```xml <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> ``` - 编辑 `hdfs-site.xml`: ```bash ...
实验中统计了 `/home/hadoop/test.txt` 和 `/user/hadoop/test.txt` 文件的行数,这展示了 Spark 对文本数据的基本操作。 3. **编写独立 Scala 应用程序** Spark 提供了 Scala、Java、Python 和 R 的 API,便于...
经过多次反复试验,完全可用的hadoop配置,有0.19的版本,也有0.20的版本。并且有脚本可以在两个版本之间...vi hadoop/conf/core-site.xml <name>hadoop.tmp.dir</name> <value>/data/hadoop_tmp</value> 祝好运!
2. Spark 2.0的安装、配置、编程模型,如RDD、DataFrame和DataSet,以及Spark SQL的使用。 3. Python在大数据处理中的应用,包括数据读取、清洗、转换和分析的流程。 4. 使用Pyspark(Python API for Spark)进行...
2. **配置Hadoop依赖**:在Spark应用中,需要包含Hadoop的相关jar包,确保Spark能与Hadoop通信。 3. **使用Spark的Hadoop兼容模式**:Spark可以以Hadoop客户端模式运行,通过`spark.hadoop`前缀配置Hadoop参数。 **...