run on a yarn ensemble is straightforward,
1.setup HADOOP_CONF_DIR
u can use command export HADOOP_CONF_DIR=xx
or add it to spark-env.sh
2.
spark-submit --master yarn --class org.apache.spark.examples.JavaWordCount --verbose --deploy-mode client ~/spark/spark-1.4.1-bin-hadoop2.4/lib/spark-examples-1.4.1-hadoop2.4.0-my.jar RELEASE 2
u can goto yarn master ui to check the app info.
also,if u wanna specify # of executors(containers?) ,just add this property in the command above
--num-executors 2
--AppMaster logs.
hadoop 2758 13206 0 16:52 ? 00:00:00 /bin/bash -c /usr/local/jdk/jdk1.6.0_31/bin/java -server -Xmx512m -Djava.io.tmpdir=/usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/hadoop/appcache/application_1441038159113_0029/container_1441038159113_0029_01_000001/tmp '-Dspark.eventLog.enabled=true' '-Dspark.externalBlockStore.folderName=spark-a5761a0d-2f87-4afc-b4eb-dbaf1fd86ef4' '-Dspark.executor.memory=2g' '-Dspark.jars=file:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/spark-examples-1.4.1-hadoop2.4.0.jar' '-Dspark.master.ui.port=7102' '-Dspark.ui.port=7108' '-Dspark.worker.ui.port=7105' '-Dspark.driver.appUIAddress=http://192.168.100.4:7108' '-Dspark.master=yarn-client' '-Dspark.driver.allowMultipleContexts=true' '-Dspark.driver.port=52394' '-Dspark.eventLog.dir=hdfs://hd02:8020/user/hadoop/spark-eventlog' '-Dspark.executor.id=driver' '-Dspark.executor.extraJavaOptions=-Xloggc:~/spark-executor.gc -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=2 -XX:CMSInitiatingOccupancyFraction=65 -XX:+UseCMSInitiatingOccupancyOnly -XX:PermSize=64m -XX:MaxPermSize=256m -XX:NewRatio=5 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:ParallelGCThreads=5' '-Dspark.executor.cores=2' '-Dspark.driver.host=192.168.100.4' '-Dspark.driver.memory=6g' '-Dspark.storage.memoryFraction=0.5' '-Dspark.app.name=JavaWordCount' '-Dspark.fileserver.uri=http://192.168.100.4:48227' '-Dspark.cores.max=50' -Dspark.yarn.app.container.log.dir=/usr/local/hadoop/hadoop-2.5.2/logs/userlogs/application_1441038159113_0029/container_1441038159113_0029_01_000001 org.apache.spark.deploy.yarn.ExecutorLauncher --arg '192.168.100.4:52394' --executor-memory 2048m --executor-cores 2 --num-executors 2 1> /usr/local/hadoop/hadoop-2.5.2/logs/userlogs/application_1441038159113_0029/container_1441038159113_0029_01_000001/stdout 2> /usr/local/hadoop/hadoop-2.5.2/logs/userlogs/application_1441038159113_0029/container_1441038159113_0029_01_000001/stderr hadoop 2763 2758 23 16:52 ? 00:00:06 /usr/local/jdk/jdk1.6.0_31/bin/java -server -Xmx512m -Djava.io.tmpdir=/usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/hadoop/appcache/application_1441038159113_0029/container_1441038159113_0029_01_000001/tmp -Dspark.eventLog.enabled=true -Dspark.externalBlockStore.folderName=spark-a5761a0d-2f87-4afc-b4eb-dbaf1fd86ef4 -Dspark.executor.memory=2g -Dspark.jars=file:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/spark-examples-1.4.1-hadoop2.4.0.jar -Dspark.master.ui.port=7102 -Dspark.ui.port=7108 -Dspark.worker.ui.port=7105 -Dspark.driver.appUIAddress=http://192.168.100.4:7108 -Dspark.master=yarn-client -Dspark.driver.allowMultipleContexts=true -Dspark.driver.port=52394 -Dspark.eventLog.dir=hdfs://hd02:8020/user/hadoop/spark-eventlog -Dspark.executor.id=driver -Dspark.executor.extraJavaOptions=-Xloggc:~/spark-executor.gc -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=2 -XX:CMSInitiatingOccupancyFraction=65 -XX:+UseCMSInitiatingOccupancyOnly -XX:PermSize=64m -XX:MaxPermSize=256m -XX:NewRatio=5 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:ParallelGCThreads=5 -Dspark.executor.cores=2 -Dspark.driver.host=192.168.100.4 -Dspark.driver.memory=6g -Dspark.storage.memoryFraction=0.5 -Dspark.app.name=JavaWordCount -Dspark.fileserver.uri=http://192.168.100.4:48227 -Dspark.cores.max=50 -Dspark.yarn.app.container.log.dir=/usr/local/hadoop/hadoop-2.5.2/logs/userlogs/application_1441038159113_0029/container_1441038159113_0029_01_000001 org.apache.spark.deploy.yarn.ExecutorLauncher --arg 192.168.100.4:52394 --executor-memory 2048m --executor-cores 2 --num-executors 2
--task container logs
hadoop 10382 1055 0 17:20 ? 00:00:00 /bin/bash -c /usr/local/jdk/jdk1.6.0_31/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms2048m -Xmx2048m '-Xloggc:~/spark-executor.gc' '-XX:+UseCMSCompactAtFullCollection' '-XX:CMSFullGCsBeforeCompaction=2' '-XX:CMSInitiatingOccupancyFraction=65' '-XX:+UseCMSInitiatingOccupancyOnly' '-XX:PermSize=64m' '-XX:MaxPermSize=256m' '-XX:NewRatio=5' '-XX:+UseParNewGC' '-XX:+UseConcMarkSweepGC' '-XX:+PrintGCDateStamps' '-XX:+PrintGCDetails' '-XX:ParallelGCThreads=5' -Djava.io.tmpdir=/usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/hadoop/appcache/application_1441038159113_0031/container_1441038159113_0031_01_000003/tmp '-Dspark.master.ui.port=7102' '-Dspark.ui.port=7108' '-Dspark.worker.ui.port=7105' '-Dspark.driver.port=44382' -Dspark.yarn.app.container.log.dir=/usr/local/hadoop/hadoop-2.5.2/logs/userlogs/application_1441038159113_0031/container_1441038159113_0031_01_000003 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url akka.tcp://sparkDriver@192.168.100.4:44382/user/CoarseGrainedScheduler --executor-id 2 --hostname gzsw-13 --cores 2 --app-id application_1441038159113_0031 --user-class-path file:/usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/hadoop/appcache/application_1441038159113_0031/container_1441038159113_0031_01_000003/__app__.jar 1> /usr/local/hadoop/hadoop-2.5.2/logs/userlogs/application_1441038159113_0031/container_1441038159113_0031_01_000003/stdout 2> /usr/local/hadoop/hadoop-2.5.2/logs/userlogs/application_1441038159113_0031/container_1441038159113_0031_01_000003/stderr hadoop 10386 10382 99 17:20 ? 00:00:25 /usr/local/jdk/jdk1.6.0_31/bin/java -server -XX:OnOutOfMemoryError=kill %p -Xms2048m -Xmx2048m -Xloggc:~/spark-executor.gc -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=2 -XX:CMSInitiatingOccupancyFraction=65 -XX:+UseCMSInitiatingOccupancyOnly -XX:PermSize=64m -XX:MaxPermSize=256m -XX:NewRatio=5 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:ParallelGCThreads=5 -Djava.io.tmpdir=/usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/hadoop/appcache/application_1441038159113_0031/container_1441038159113_0031_01_000003/tmp -Dspark.master.ui.port=7102 -Dspark.ui.port=7108 -Dspark.worker.ui.port=7105 -Dspark.driver.port=44382 -Dspark.yarn.app.container.log.dir=/usr/local/hadoop/hadoop-2.5.2/logs/userlogs/application_1441038159113_0031/container_1441038159113_0031_01_000003 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url akka.tcp://sparkDriver@192.168.100.4:44382/user/CoarseGrainedScheduler --executor-id 2 --hostname gzsw-13 --cores 2 --app-id application_1441038159113_0031 --user-class-path file:/usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/hadoop/appcache/application_1441038159113_0031/container_1441038159113_0031_01_000003/__app__.jar
corresponding figures:
the logs from driver :(u will two tasks are run on host-05 on first stage; one for each of both host-05,06 for second stage)
hadoop@GZsw04:~/spark/spark-1.4.1-bin-hadoop2.4$ spark-submit --master yarn --class org.apache.spark.examples.JavaWordCount --verbose --deploy-mode client ~/spark/spark-1.4.1-bin-hadoop2.4/lib/spark-examples-1.4.1-hadoop2.4.0-my.jar RELEASE 2 Using properties file: /home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/conf/spark-defaults.conf Adding default property: spark.executor.extraJavaOptions=-Xloggc:/home/hadoop/spark-executor.gc -XX:+PrintGCDateStamps -XX:+PrintGCDetails Adding default property: spark.eventLog.enabled=true Adding default property: spark.ui.port=7106 Adding default property: spark.deploy.spreadOut=false Adding default property: spark.worker.ui.port=7105 Adding default property: spark.master.ui.port=7102 Adding default property: spark.eventLog.dir=/home/hadoop/spark/spark-eventlog Adding default property: spark.driver.allowMultipleContexts=true Parsed arguments: master yarn deployMode client executorMemory null executorCores null totalExecutorCores null propertiesFile /home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/conf/spark-defaults.conf driverMemory 1g driverCores null driverExtraClassPath null driverExtraLibraryPath null driverExtraJavaOptions null supervise false queue null numExecutors null files null pyFiles null archives null mainClass org.apache.spark.examples.JavaWordCount primaryResource file:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/spark-examples-1.4.1-hadoop2.4.0-my.jar name org.apache.spark.examples.JavaWordCount childArgs [RELEASE 2] jars null packages null repositories null verbose true Spark properties used, including those specified through --conf and those from the properties file /home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/conf/spark-defaults.conf: spark.eventLog.enabled -> true spark.driver.allowMultipleContexts -> true spark.ui.port -> 7106 spark.executor.extraJavaOptions -> -Xloggc:/home/hadoop/spark-executor.gc -XX:+PrintGCDateStamps -XX:+PrintGCDetails spark.deploy.spreadOut -> false spark.eventLog.dir -> /home/hadoop/spark/spark-eventlog spark.worker.ui.port -> 7105 spark.master.ui.port -> 7102 Main class: org.apache.spark.examples.JavaWordCount Arguments: RELEASE 2 System properties: spark.driver.memory -> 1g spark.eventLog.enabled -> true spark.driver.allowMultipleContexts -> true SPARK_SUBMIT -> true spark.ui.port -> 7106 spark.executor.extraJavaOptions -> -Xloggc:/home/hadoop/spark-executor.gc -XX:+PrintGCDateStamps -XX:+PrintGCDetails spark.deploy.spreadOut -> false spark.app.name -> org.apache.spark.examples.JavaWordCount spark.jars -> file:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/spark-examples-1.4.1-hadoop2.4.0-my.jar spark.eventLog.dir -> /home/hadoop/spark/spark-eventlog spark.master -> yarn-client spark.worker.ui.port -> 7105 spark.master.ui.port -> 7102 Classpath elements: file:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/spark-examples-1.4.1-hadoop2.4.0-my.jar 15/11/25 16:46:55 INFO spark.SparkContext: Running Spark version 1.4.1 15/11/25 16:46:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/11/25 16:46:55 INFO spark.SecurityManager: Changing view acls to: hadoop 15/11/25 16:46:55 INFO spark.SecurityManager: Changing modify acls to: hadoop 15/11/25 16:46:55 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop) 15/11/25 16:46:56 INFO slf4j.Slf4jLogger: Slf4jLogger started 15/11/25 16:46:56 INFO Remoting: Starting remoting 15/11/25 16:46:56 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.100.4:45880] 15/11/25 16:46:56 INFO util.Utils: Successfully started service 'sparkDriver' on port 45880. 15/11/25 16:46:56 INFO spark.SparkEnv: Registering MapOutputTracker 15/11/25 16:46:56 INFO spark.SparkEnv: Registering BlockManagerMaster 15/11/25 16:46:56 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-52cdfa49-3560-40bd-9540-107f059b5d95/blockmgr-a25e4dc5-b8e0-4877-ad63-b0e32880e187 15/11/25 16:46:56 INFO storage.MemoryStore: MemoryStore started with capacity 529.9 MB 15/11/25 16:46:57 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-52cdfa49-3560-40bd-9540-107f059b5d95/httpd-8b586e36-69a3-46c1-880d-5f294a643833 15/11/25 16:46:57 INFO spark.HttpServer: Starting HTTP Server 15/11/25 16:46:57 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/11/25 16:46:57 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:51033 15/11/25 16:46:57 INFO util.Utils: Successfully started service 'HTTP file server' on port 51033. 15/11/25 16:46:57 INFO spark.SparkEnv: Registering OutputCommitCoordinator 15/11/25 16:46:57 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/11/25 16:46:57 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:7106 15/11/25 16:46:57 INFO util.Utils: Successfully started service 'SparkUI' on port 7106. 15/11/25 16:46:57 INFO ui.SparkUI: Started SparkUI at http://192.168.100.4:7106 15/11/25 16:46:57 INFO spark.SparkContext: Added JAR file:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/spark-examples-1.4.1-hadoop2.4.0-my.jar at http://192.168.100.4:51033/jars/spark-examples-1.4.1-hadoop2.4.0-my.jar with timestamp 1448441217526 15/11/25 16:46:57 WARN cluster.YarnClientSchedulerBackend: NOTE: SPARK_WORKER_MEMORY is deprecated. Use SPARK_EXECUTOR_MEMORY or --executor-memory through spark-submit instead. 15/11/25 16:46:57 WARN cluster.YarnClientSchedulerBackend: NOTE: SPARK_WORKER_CORES is deprecated. Use SPARK_EXECUTOR_CORES or --executor-cores through spark-submit instead. 15/11/25 16:46:57 INFO client.RMProxy: Connecting to ResourceManager at hd02/192.168.100.4:8032 15/11/25 16:46:57 INFO yarn.Client: Requesting a new application from cluster with 10 NodeManagers 15/11/25 16:46:57 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container) 15/11/25 16:46:57 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 15/11/25 16:46:57 INFO yarn.Client: Setting up container launch context for our AM 15/11/25 16:46:57 INFO yarn.Client: Preparing resources for our AM container 15/11/25 16:46:58 INFO yarn.Client: Uploading resource file:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/spark-assembly-1.4.1-hadoop2.4.0.jar -> hdfs://mycluster/user/hadoop/.sparkStaging/application_1441038159113_0003/spark-assembly-1.4.1-hadoop2.4.0.jar 15/11/25 16:47:00 INFO yarn.Client: Uploading resource file:/tmp/spark-52cdfa49-3560-40bd-9540-107f059b5d95/__hadoop_conf__6446760494119929942.zip -> hdfs://mycluster/user/hadoop/.sparkStaging/application_1441038159113_0003/__hadoop_conf__6446760494119929942.zip 15/11/25 16:47:00 INFO yarn.Client: Setting up the launch environment for our AM container 15/11/25 16:47:00 INFO spark.SecurityManager: Changing view acls to: hadoop 15/11/25 16:47:00 INFO spark.SecurityManager: Changing modify acls to: hadoop 15/11/25 16:47:00 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop) 15/11/25 16:47:00 INFO yarn.Client: Submitting application 3 to ResourceManager 15/11/25 16:47:00 INFO impl.YarnClientImpl: Submitted application application_1441038159113_0003 15/11/25 16:47:01 INFO yarn.Client: Application report for application_1441038159113_0003 (state: ACCEPTED) 15/11/25 16:47:01 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1448441220409 final status: UNDEFINED tracking URL: http://hd02:7104/proxy/application_1441038159113_0003/ user: hadoop 15/11/25 16:47:02 INFO yarn.Client: Application report for application_1441038159113_0003 (state: ACCEPTED) 15/11/25 16:47:03 INFO yarn.Client: Application report for application_1441038159113_0003 (state: ACCEPTED) 15/11/25 16:47:04 INFO yarn.Client: Application report for application_1441038159113_0003 (state: ACCEPTED) 15/11/25 16:47:05 INFO yarn.Client: Application report for application_1441038159113_0003 (state: ACCEPTED) 15/11/25 16:47:06 INFO yarn.Client: Application report for application_1441038159113_0003 (state: ACCEPTED) 15/11/25 16:47:06 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as AkkaRpcEndpointRef(Actor[akka.tcp://sparkYarnAM@192.168.100.14:46652/user/YarnAM#-1250321572]) 15/11/25 16:47:06 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> hd02, PROXY_URI_BASES -> http://hd02:7104/proxy/application_1441038159113_0003), /proxy/application_1441038159113_0003 15/11/25 16:47:06 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 15/11/25 16:47:07 INFO yarn.Client: Application report for application_1441038159113_0003 (state: RUNNING) 15/11/25 16:47:07 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host: 192.168.100.14 ApplicationMaster RPC port: 0 queue: default start time: 1448441220409 final status: UNDEFINED tracking URL: http://hd02:7104/proxy/application_1441038159113_0003/ user: hadoop 15/11/25 16:47:07 INFO cluster.YarnClientSchedulerBackend: Application application_1441038159113_0003 has started running. 15/11/25 16:47:07 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 52047. 15/11/25 16:47:07 INFO netty.NettyBlockTransferService: Server created on 52047 15/11/25 16:47:07 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/11/25 16:47:07 INFO storage.BlockManagerMasterEndpoint: Registering block manager 192.168.100.4:52047 with 529.9 MB RAM, BlockManagerId(driver, 192.168.100.4, 52047) 15/11/25 16:47:07 INFO storage.BlockManagerMaster: Registered BlockManager 15/11/25 16:47:07 INFO scheduler.EventLoggingListener: Logging events to file:/home/hadoop/spark/spark-eventlog/application_1441038159113_0003 15/11/25 16:47:17 INFO cluster.YarnClientSchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@gzsw-05:55796/user/Executor#-2059071929]) with ID 1 15/11/25 16:47:17 INFO storage.BlockManagerMasterEndpoint: Registering block manager gzsw-05:52897 with 2.1 GB RAM, BlockManagerId(1, gzsw-05, 52897) 15/11/25 16:47:17 INFO cluster.YarnClientSchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@gzsw-06:56733/user/Executor#261866940]) with ID 2 15/11/25 16:47:17 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8 15/11/25 16:47:17 INFO storage.BlockManagerMasterEndpoint: Registering block manager gzsw-06:38994 with 2.1 GB RAM, BlockManagerId(2, gzsw-06, 38994) 15/11/25 16:47:17 INFO storage.MemoryStore: ensureFreeSpace(228640) called with curMem=0, maxMem=555684986 15/11/25 16:47:17 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 223.3 KB, free 529.7 MB) 15/11/25 16:47:17 INFO storage.MemoryStore: ensureFreeSpace(18166) called with curMem=228640, maxMem=555684986 15/11/25 16:47:17 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 17.7 KB, free 529.7 MB) 15/11/25 16:47:17 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.100.4:52047 (size: 17.7 KB, free: 529.9 MB) 15/11/25 16:47:17 INFO spark.SparkContext: Created broadcast 0 from textFile at JavaWordCount.java:49 15/11/25 16:47:17 INFO mapred.FileInputFormat: Total input paths to process : 1 15/11/25 16:47:17 INFO spark.SparkContext: Starting job: collect at JavaWordCount.java:72 15/11/25 16:47:17 INFO scheduler.DAGScheduler: Registering RDD 3 (mapToPair at JavaWordCount.java:58) 15/11/25 16:47:17 INFO scheduler.DAGScheduler: Got job 0 (collect at JavaWordCount.java:72) with 2 output partitions (allowLocal=false) 15/11/25 16:47:17 INFO scheduler.DAGScheduler: Final stage: ResultStage 1(collect at JavaWordCount.java:72) 15/11/25 16:47:17 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 0) 15/11/25 16:47:17 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 0) 15/11/25 16:47:17 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[3] at mapToPair at JavaWordCount.java:58), which has no missing parents 15/11/25 16:47:18 INFO storage.MemoryStore: ensureFreeSpace(4736) called with curMem=246806, maxMem=555684986 15/11/25 16:47:18 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.6 KB, free 529.7 MB) 15/11/25 16:47:18 INFO storage.MemoryStore: ensureFreeSpace(2644) called with curMem=251542, maxMem=555684986 15/11/25 16:47:18 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.6 KB, free 529.7 MB) 15/11/25 16:47:18 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.100.4:52047 (size: 2.6 KB, free: 529.9 MB) 15/11/25 16:47:18 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:874 15/11/25 16:47:18 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[3] at mapToPair at JavaWordCount.java:58) 15/11/25 16:47:18 INFO cluster.YarnScheduler: Adding task set 0.0 with 2 tasks 15/11/25 16:47:18 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, gzsw-05, NODE_LOCAL, 1479 bytes) 15/11/25 16:47:18 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, gzsw-05, NODE_LOCAL, 1479 bytes) 15/11/25 16:47:19 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on gzsw-05:52897 (size: 2.6 KB, free: 2.1 GB) 15/11/25 16:47:19 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on gzsw-05:52897 (size: 17.7 KB, free: 2.1 GB) 15/11/25 16:47:20 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 2705 ms on gzsw-05 (1/2) 15/11/25 16:47:20 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 2725 ms on gzsw-05 (2/2) 15/11/25 16:47:20 INFO cluster.YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool 15/11/25 16:47:20 INFO scheduler.DAGScheduler: ShuffleMapStage 0 (mapToPair at JavaWordCount.java:58) finished in 2.733 s 15/11/25 16:47:20 INFO scheduler.DAGScheduler: looking for newly runnable stages 15/11/25 16:47:20 INFO scheduler.DAGScheduler: running: Set() 15/11/25 16:47:20 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 1) 15/11/25 16:47:20 INFO scheduler.DAGScheduler: failed: Set() 15/11/25 16:47:20 INFO scheduler.DAGScheduler: Missing parents for ResultStage 1: List() 15/11/25 16:47:20 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (ShuffledRDD[4] at reduceByKey at JavaWordCount.java:65), which is now runnable 15/11/25 16:47:20 INFO storage.MemoryStore: ensureFreeSpace(2408) called with curMem=254186, maxMem=555684986 15/11/25 16:47:20 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.4 KB, free 529.7 MB) 15/11/25 16:47:20 INFO storage.MemoryStore: ensureFreeSpace(1459) called with curMem=256594, maxMem=555684986 15/11/25 16:47:20 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1459.0 B, free 529.7 MB) 15/11/25 16:47:20 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.100.4:52047 (size: 1459.0 B, free: 529.9 MB) 15/11/25 16:47:20 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:874 15/11/25 16:47:20 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (ShuffledRDD[4] at reduceByKey at JavaWordCount.java:65) 15/11/25 16:47:20 INFO cluster.YarnScheduler: Adding task set 1.0 with 2 tasks 15/11/25 16:47:20 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, gzsw-06, PROCESS_LOCAL, 1246 bytes) 15/11/25 16:47:20 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, gzsw-05, PROCESS_LOCAL, 1246 bytes) 15/11/25 16:47:20 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on gzsw-05:52897 (size: 1459.0 B, free: 2.1 GB) 15/11/25 16:47:20 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to gzsw-05:55796 15/11/25 16:47:20 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 147 bytes 15/11/25 16:47:20 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 98 ms on gzsw-05 (1/2) 15/11/25 16:47:22 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on gzsw-06:38994 (size: 1459.0 B, free: 2.1 GB) 15/11/25 16:47:22 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to gzsw-06:56733 15/11/25 16:47:22 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 1748 ms on gzsw-06 (2/2) 15/11/25 16:47:22 INFO scheduler.DAGScheduler: ResultStage 1 (collect at JavaWordCount.java:72) finished in 1.749 s 15/11/25 16:47:22 INFO cluster.YarnScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool 15/11/25 16:47:22 INFO scheduler.DAGScheduler: Job 0 finished: collect at JavaWordCount.java:72, took 4.603967 s total items 14 15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null} 15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null} 15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null} 15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null} 15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null} 15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} 15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null} 15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null} 15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null} 15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null} 15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null} 15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null} 15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null} 15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null} 15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null} 15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null} 15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null} 15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null} 15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null} 15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null} 15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null} 15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null} 15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null} 15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null} 15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null} 15/11/25 16:47:22 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.100.4:7106 15/11/25 16:47:22 INFO scheduler.DAGScheduler: Stopping DAGScheduler 15/11/25 16:47:22 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors 15/11/25 16:47:22 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread 15/11/25 16:47:22 INFO cluster.YarnClientSchedulerBackend: Asking each executor to shut down 15/11/25 16:47:22 INFO cluster.YarnClientSchedulerBackend: Stopped 15/11/25 16:47:22 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 15/11/25 16:47:22 INFO util.Utils: path = /tmp/spark-52cdfa49-3560-40bd-9540-107f059b5d95/blockmgr-a25e4dc5-b8e0-4877-ad63-b0e32880e187, already present as root for deletion. 15/11/25 16:47:22 INFO storage.MemoryStore: MemoryStore cleared 15/11/25 16:47:22 INFO storage.BlockManager: BlockManager stopped 15/11/25 16:47:22 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 15/11/25 16:47:22 INFO spark.SparkContext: Successfully stopped SparkContext 15/11/25 16:47:22 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 15/11/25 16:47:22 INFO util.Utils: Shutdown hook called 15/11/25 16:47:22 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 15/11/25 16:47:22 INFO util.Utils: Deleting directory /tmp/spark-52cdfa49-3560-40bd-9540-107f059b5d95 15/11/25 16:47:22 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
相关推荐
spark-sql sql on yarn --deploy-mode cluster 改造为 可以cluster提交
spark-2.2.0-yarn-shuffle.jar
spark-1.6.1-yarn-shuffle.jar 下载。spark-1.6.1-yarn-shuffle.jar 下载。spark-1.6.1-yarn-shuffle.jar 下载。
Spark可以与Hadoop生态系统无缝集成,利用HDFS作为数据源,并且可以在YARN上运行。 4. 压缩包内容: - spark-3.1.2.tgz:这是一个tar归档文件,经过gzip压缩,通常包含源代码、文档、配置文件和编译后的二进制文件...
5. 集成与兼容性:预编译的Hadoop 3.2版本意味着Spark 3.1.3可以更好地与Hadoop生态系统中的其他组件(如HDFS、YARN)协同工作,提供更广泛的数据源支持。 6. 开发者工具:Spark提供了一个强大的交互式命令行界面...
内容概要:由于cdh6.3.2的spark版本为2.4.0,并且spark-sql被阉割,现基于cdh6.3.2,scala2.12.0,java1.8,maven3.6.3,,对spark-3.2.2源码进行编译 应用:该资源可用于cdh6.3.2集群配置spark客户端,用于spark-sql
与Hadoop2.6版本兼容,意味着Spark可以充分利用Hadoop的分布式存储系统HDFS和YARN资源管理器进行大数据处理。 Spark的核心特性包括: 1. **弹性分布式数据集(RDD)**:RDD是Spark的基本数据抽象,它是不可变、分区...
- 如果需要运行在Hadoop YARN上,还需要配置`yarn-site.xml`和`core-site.xml`等相关Hadoop配置文件。 - 启动Spark相关服务,如Master和Worker节点。 **4. 使用Spark Shell** Spark提供了一个交互式的Shell,可以...
SPARK2_ON_YARN-2.4.0 jar包下载
接着,可以启动Spark的独立模式或者与YARN、Mesos等集群管理器结合的集群模式。在开发应用程序时,可以使用Scala、Java、Python或R语言的Spark API,编写分布式数据处理代码。 Spark支持多种数据源,包括HDFS、...
为了运行Spark,你需要有一个运行的Hadoop环境,因为Spark依赖Hadoop的YARN资源管理系统。如果你的集群已经配置好了Hadoop,那么可以通过YARN提交Spark作业。如果只是本地测试,可以使用Spark的独立模式,通过`spark...
spark-yarn_2.11-2.1.3-SNAPSHOT.jar
Spark on YARN允许Spark应用程序在Hadoop集群上运行,利用YARN进行任务调度和数据存储管理。这种模式下,Spark运行在YARN的容器中,而不是在独立的集群模式下,这使得Spark可以无缝地集成到现有的Hadoop环境中。 在...
同时,理解Hadoop生态系统的其他组件,如HDFS和YARN,将有助于更好地集成和管理Spark作业。 总的来说,Spark 3.0.0-bin-hadoop3.2是一个强大且灵活的大数据处理工具,适用于Windows平台,为开发者提供了高效的数据...
在Hadoop 2.4的环境中,Spark能够利用Hadoop的分布式存储系统HDFS和资源管理系统YARN,实现数据的读取和计算。Hadoop 2.4引入了YARN(Yet Another Resource Negotiator),作为新的资源管理器,取代了原来的...
此外,如果你打算在Hadoop YARN上运行Spark,还需要正确配置Hadoop的客户端环境。 在实际应用中,Spark可以通过编程接口(API)与多种数据源交互,如HDFS、Cassandra、HBase、Amazon S3等。它的RDD(弹性分布式数据...
CDH 5.14.2是Cloudera的一个重要发行版,包含了Hadoop 2.6.0,这使得Spark可以充分利用YARN(Yet Another Resource Negotiator)资源管理器来调度任务,提高集群资源利用率。在CDH中部署Spark,用户可以享受到更完善...
Spark的工作模式可以是本地模式、standalone模式、YARN模式或Mesos模式,其中YARN模式就是在Hadoop 2.6环境下运行Spark的方式。 3. **Hadoop集成**:Spark-2.4.0-bin-hadoop2.6.tgz表明这个版本的Spark已经预编译了...
为此,需要下载SPARK2_ON_YARN-2.2.0.cloudera1.jar,并将其放在Cloudera Manager节点的`/opt/cloudera/csd/`目录下,确保文件所有者和组为`cloudera-scm:cloudera-scm`。 - 重启`cloudera-scm-server`服务,然后在...