`

spark-run apps on yarn mode

 
阅读更多

  run on a yarn ensemble is straightforward,

  1.setup HADOOP_CONF_DIR

   u can use command export HADOOP_CONF_DIR=xx

   or add it to spark-env.sh

   2.

spark-submit  --master yarn --class org.apache.spark.examples.JavaWordCount --verbose --deploy-mode client ~/spark/spark-1.4.1-bin-hadoop2.4/lib/spark-examples-1.4.1-hadoop2.4.0-my.jar RELEASE 2

  u can goto yarn master ui to check the app info.

  also,if u wanna specify # of executors(containers?) ,just add this property in the command above

 --num-executors 2

 

--AppMaster logs.

hadoop    2758 13206  0 16:52 ?        00:00:00 /bin/bash -c /usr/local/jdk/jdk1.6.0_31/bin/java -server -Xmx512m -Djava.io.tmpdir=/usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/hadoop/appcache/application_1441038159113_0029/container_1441038159113_0029_01_000001/tmp '-Dspark.eventLog.enabled=true' '-Dspark.externalBlockStore.folderName=spark-a5761a0d-2f87-4afc-b4eb-dbaf1fd86ef4' '-Dspark.executor.memory=2g' '-Dspark.jars=file:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/spark-examples-1.4.1-hadoop2.4.0.jar' '-Dspark.master.ui.port=7102' '-Dspark.ui.port=7108' '-Dspark.worker.ui.port=7105' '-Dspark.driver.appUIAddress=http://192.168.100.4:7108' '-Dspark.master=yarn-client' '-Dspark.driver.allowMultipleContexts=true' '-Dspark.driver.port=52394' '-Dspark.eventLog.dir=hdfs://hd02:8020/user/hadoop/spark-eventlog' '-Dspark.executor.id=driver' '-Dspark.executor.extraJavaOptions=-Xloggc:~/spark-executor.gc -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=2 -XX:CMSInitiatingOccupancyFraction=65 -XX:+UseCMSInitiatingOccupancyOnly -XX:PermSize=64m -XX:MaxPermSize=256m -XX:NewRatio=5 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:ParallelGCThreads=5' '-Dspark.executor.cores=2' '-Dspark.driver.host=192.168.100.4' '-Dspark.driver.memory=6g' '-Dspark.storage.memoryFraction=0.5' '-Dspark.app.name=JavaWordCount' '-Dspark.fileserver.uri=http://192.168.100.4:48227' '-Dspark.cores.max=50' -Dspark.yarn.app.container.log.dir=/usr/local/hadoop/hadoop-2.5.2/logs/userlogs/application_1441038159113_0029/container_1441038159113_0029_01_000001 org.apache.spark.deploy.yarn.ExecutorLauncher --arg '192.168.100.4:52394' --executor-memory 2048m --executor-cores 2 --num-executors  2 1> /usr/local/hadoop/hadoop-2.5.2/logs/userlogs/application_1441038159113_0029/container_1441038159113_0029_01_000001/stdout 2> /usr/local/hadoop/hadoop-2.5.2/logs/userlogs/application_1441038159113_0029/container_1441038159113_0029_01_000001/stderr

hadoop    2763  2758 23 16:52 ?        00:00:06 /usr/local/jdk/jdk1.6.0_31/bin/java -server -Xmx512m -Djava.io.tmpdir=/usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/hadoop/appcache/application_1441038159113_0029/container_1441038159113_0029_01_000001/tmp -Dspark.eventLog.enabled=true -Dspark.externalBlockStore.folderName=spark-a5761a0d-2f87-4afc-b4eb-dbaf1fd86ef4 -Dspark.executor.memory=2g -Dspark.jars=file:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/spark-examples-1.4.1-hadoop2.4.0.jar -Dspark.master.ui.port=7102 -Dspark.ui.port=7108 -Dspark.worker.ui.port=7105 -Dspark.driver.appUIAddress=http://192.168.100.4:7108 -Dspark.master=yarn-client -Dspark.driver.allowMultipleContexts=true -Dspark.driver.port=52394 -Dspark.eventLog.dir=hdfs://hd02:8020/user/hadoop/spark-eventlog -Dspark.executor.id=driver -Dspark.executor.extraJavaOptions=-Xloggc:~/spark-executor.gc -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=2 -XX:CMSInitiatingOccupancyFraction=65 -XX:+UseCMSInitiatingOccupancyOnly -XX:PermSize=64m -XX:MaxPermSize=256m -XX:NewRatio=5 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:ParallelGCThreads=5 -Dspark.executor.cores=2 -Dspark.driver.host=192.168.100.4 -Dspark.driver.memory=6g -Dspark.storage.memoryFraction=0.5 -Dspark.app.name=JavaWordCount -Dspark.fileserver.uri=http://192.168.100.4:48227 -Dspark.cores.max=50 -Dspark.yarn.app.container.log.dir=/usr/local/hadoop/hadoop-2.5.2/logs/userlogs/application_1441038159113_0029/container_1441038159113_0029_01_000001 org.apache.spark.deploy.yarn.ExecutorLauncher --arg 192.168.100.4:52394 --executor-memory 2048m --executor-cores 2 --num-executors 2

  --task container logs

hadoop   10382  1055  0 17:20 ?        00:00:00 /bin/bash -c /usr/local/jdk/jdk1.6.0_31/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms2048m -Xmx2048m '-Xloggc:~/spark-executor.gc' '-XX:+UseCMSCompactAtFullCollection' '-XX:CMSFullGCsBeforeCompaction=2' '-XX:CMSInitiatingOccupancyFraction=65' '-XX:+UseCMSInitiatingOccupancyOnly' '-XX:PermSize=64m' '-XX:MaxPermSize=256m' '-XX:NewRatio=5' '-XX:+UseParNewGC' '-XX:+UseConcMarkSweepGC' '-XX:+PrintGCDateStamps' '-XX:+PrintGCDetails' '-XX:ParallelGCThreads=5' -Djava.io.tmpdir=/usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/hadoop/appcache/application_1441038159113_0031/container_1441038159113_0031_01_000003/tmp '-Dspark.master.ui.port=7102' '-Dspark.ui.port=7108' '-Dspark.worker.ui.port=7105' '-Dspark.driver.port=44382' -Dspark.yarn.app.container.log.dir=/usr/local/hadoop/hadoop-2.5.2/logs/userlogs/application_1441038159113_0031/container_1441038159113_0031_01_000003 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url akka.tcp://sparkDriver@192.168.100.4:44382/user/CoarseGrainedScheduler --executor-id 2 --hostname gzsw-13 --cores 2 --app-id application_1441038159113_0031 --user-class-path file:/usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/hadoop/appcache/application_1441038159113_0031/container_1441038159113_0031_01_000003/__app__.jar 1> /usr/local/hadoop/hadoop-2.5.2/logs/userlogs/application_1441038159113_0031/container_1441038159113_0031_01_000003/stdout 2> /usr/local/hadoop/hadoop-2.5.2/logs/userlogs/application_1441038159113_0031/container_1441038159113_0031_01_000003/stderr
hadoop   10386 10382 99 17:20 ?        00:00:25 /usr/local/jdk/jdk1.6.0_31/bin/java -server -XX:OnOutOfMemoryError=kill %p -Xms2048m -Xmx2048m -Xloggc:~/spark-executor.gc -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=2 -XX:CMSInitiatingOccupancyFraction=65 -XX:+UseCMSInitiatingOccupancyOnly -XX:PermSize=64m -XX:MaxPermSize=256m -XX:NewRatio=5 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:ParallelGCThreads=5 -Djava.io.tmpdir=/usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/hadoop/appcache/application_1441038159113_0031/container_1441038159113_0031_01_000003/tmp -Dspark.master.ui.port=7102 -Dspark.ui.port=7108 -Dspark.worker.ui.port=7105 -Dspark.driver.port=44382 -Dspark.yarn.app.container.log.dir=/usr/local/hadoop/hadoop-2.5.2/logs/userlogs/application_1441038159113_0031/container_1441038159113_0031_01_000003 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url akka.tcp://sparkDriver@192.168.100.4:44382/user/CoarseGrainedScheduler --executor-id 2 --hostname gzsw-13 --cores 2 --app-id application_1441038159113_0031 --user-class-path file:/usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/hadoop/appcache/application_1441038159113_0031/container_1441038159113_0031_01_000003/__app__.jar

 

  corresponding figures:



 

 

   the logs from driver :(u will two tasks are run on host-05 on first stage; one for each of both host-05,06 for second stage)

hadoop@GZsw04:~/spark/spark-1.4.1-bin-hadoop2.4$ spark-submit  --master yarn --class org.apache.spark.examples.JavaWordCount --verbose --deploy-mode client ~/spark/spark-1.4.1-bin-hadoop2.4/lib/spark-examples-1.4.1-hadoop2.4.0-my.jar RELEASE 2
Using properties file: /home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/conf/spark-defaults.conf
Adding default property: spark.executor.extraJavaOptions=-Xloggc:/home/hadoop/spark-executor.gc -XX:+PrintGCDateStamps -XX:+PrintGCDetails
Adding default property: spark.eventLog.enabled=true
Adding default property: spark.ui.port=7106
Adding default property: spark.deploy.spreadOut=false
Adding default property: spark.worker.ui.port=7105
Adding default property: spark.master.ui.port=7102
Adding default property: spark.eventLog.dir=/home/hadoop/spark/spark-eventlog
Adding default property: spark.driver.allowMultipleContexts=true
Parsed arguments:
  master                  yarn
  deployMode              client
  executorMemory          null
  executorCores           null
  totalExecutorCores      null
  propertiesFile          /home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/conf/spark-defaults.conf
  driverMemory            1g
  driverCores             null
  driverExtraClassPath    null
  driverExtraLibraryPath  null
  driverExtraJavaOptions  null
  supervise               false
  queue                   null
  numExecutors            null
  files                   null
  pyFiles                 null
  archives                null
  mainClass               org.apache.spark.examples.JavaWordCount
  primaryResource         file:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/spark-examples-1.4.1-hadoop2.4.0-my.jar
  name                    org.apache.spark.examples.JavaWordCount
  childArgs               [RELEASE 2]
  jars                    null
  packages                null
  repositories            null
  verbose                 true

Spark properties used, including those specified through
 --conf and those from the properties file /home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/conf/spark-defaults.conf:
  spark.eventLog.enabled -> true
  spark.driver.allowMultipleContexts -> true
  spark.ui.port -> 7106
  spark.executor.extraJavaOptions -> -Xloggc:/home/hadoop/spark-executor.gc -XX:+PrintGCDateStamps -XX:+PrintGCDetails
  spark.deploy.spreadOut -> false
  spark.eventLog.dir -> /home/hadoop/spark/spark-eventlog
  spark.worker.ui.port -> 7105
  spark.master.ui.port -> 7102

    
Main class:
org.apache.spark.examples.JavaWordCount
Arguments:
RELEASE
2
System properties:
spark.driver.memory -> 1g
spark.eventLog.enabled -> true
spark.driver.allowMultipleContexts -> true
SPARK_SUBMIT -> true
spark.ui.port -> 7106
spark.executor.extraJavaOptions -> -Xloggc:/home/hadoop/spark-executor.gc -XX:+PrintGCDateStamps -XX:+PrintGCDetails
spark.deploy.spreadOut -> false
spark.app.name -> org.apache.spark.examples.JavaWordCount
spark.jars -> file:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/spark-examples-1.4.1-hadoop2.4.0-my.jar
spark.eventLog.dir -> /home/hadoop/spark/spark-eventlog
spark.master -> yarn-client
spark.worker.ui.port -> 7105
spark.master.ui.port -> 7102
Classpath elements:
file:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/spark-examples-1.4.1-hadoop2.4.0-my.jar


15/11/25 16:46:55 INFO spark.SparkContext: Running Spark version 1.4.1
15/11/25 16:46:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/11/25 16:46:55 INFO spark.SecurityManager: Changing view acls to: hadoop
15/11/25 16:46:55 INFO spark.SecurityManager: Changing modify acls to: hadoop
15/11/25 16:46:55 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/11/25 16:46:56 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/11/25 16:46:56 INFO Remoting: Starting remoting
15/11/25 16:46:56 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.100.4:45880]
15/11/25 16:46:56 INFO util.Utils: Successfully started service 'sparkDriver' on port 45880.
15/11/25 16:46:56 INFO spark.SparkEnv: Registering MapOutputTracker
15/11/25 16:46:56 INFO spark.SparkEnv: Registering BlockManagerMaster
15/11/25 16:46:56 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-52cdfa49-3560-40bd-9540-107f059b5d95/blockmgr-a25e4dc5-b8e0-4877-ad63-b0e32880e187
15/11/25 16:46:56 INFO storage.MemoryStore: MemoryStore started with capacity 529.9 MB
15/11/25 16:46:57 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-52cdfa49-3560-40bd-9540-107f059b5d95/httpd-8b586e36-69a3-46c1-880d-5f294a643833
15/11/25 16:46:57 INFO spark.HttpServer: Starting HTTP Server
15/11/25 16:46:57 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/11/25 16:46:57 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:51033
15/11/25 16:46:57 INFO util.Utils: Successfully started service 'HTTP file server' on port 51033.
15/11/25 16:46:57 INFO spark.SparkEnv: Registering OutputCommitCoordinator
15/11/25 16:46:57 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/11/25 16:46:57 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:7106
15/11/25 16:46:57 INFO util.Utils: Successfully started service 'SparkUI' on port 7106.
15/11/25 16:46:57 INFO ui.SparkUI: Started SparkUI at http://192.168.100.4:7106
15/11/25 16:46:57 INFO spark.SparkContext: Added JAR file:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/spark-examples-1.4.1-hadoop2.4.0-my.jar at http://192.168.100.4:51033/jars/spark-examples-1.4.1-hadoop2.4.0-my.jar with timestamp 1448441217526
15/11/25 16:46:57 WARN cluster.YarnClientSchedulerBackend: NOTE: SPARK_WORKER_MEMORY is deprecated. Use SPARK_EXECUTOR_MEMORY or --executor-memory through spark-submit instead.
15/11/25 16:46:57 WARN cluster.YarnClientSchedulerBackend: NOTE: SPARK_WORKER_CORES is deprecated. Use SPARK_EXECUTOR_CORES or --executor-cores through spark-submit instead.
15/11/25 16:46:57 INFO client.RMProxy: Connecting to ResourceManager at hd02/192.168.100.4:8032
15/11/25 16:46:57 INFO yarn.Client: Requesting a new application from cluster with 10 NodeManagers
15/11/25 16:46:57 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
15/11/25 16:46:57 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
15/11/25 16:46:57 INFO yarn.Client: Setting up container launch context for our AM
15/11/25 16:46:57 INFO yarn.Client: Preparing resources for our AM container
15/11/25 16:46:58 INFO yarn.Client: Uploading resource file:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/spark-assembly-1.4.1-hadoop2.4.0.jar -> hdfs://mycluster/user/hadoop/.sparkStaging/application_1441038159113_0003/spark-assembly-1.4.1-hadoop2.4.0.jar
15/11/25 16:47:00 INFO yarn.Client: Uploading resource file:/tmp/spark-52cdfa49-3560-40bd-9540-107f059b5d95/__hadoop_conf__6446760494119929942.zip -> hdfs://mycluster/user/hadoop/.sparkStaging/application_1441038159113_0003/__hadoop_conf__6446760494119929942.zip
15/11/25 16:47:00 INFO yarn.Client: Setting up the launch environment for our AM container
15/11/25 16:47:00 INFO spark.SecurityManager: Changing view acls to: hadoop
15/11/25 16:47:00 INFO spark.SecurityManager: Changing modify acls to: hadoop
15/11/25 16:47:00 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/11/25 16:47:00 INFO yarn.Client: Submitting application 3 to ResourceManager
15/11/25 16:47:00 INFO impl.YarnClientImpl: Submitted application application_1441038159113_0003
15/11/25 16:47:01 INFO yarn.Client: Application report for application_1441038159113_0003 (state: ACCEPTED)
15/11/25 16:47:01 INFO yarn.Client: 
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: default
	 start time: 1448441220409
	 final status: UNDEFINED
	 tracking URL: http://hd02:7104/proxy/application_1441038159113_0003/
	 user: hadoop
15/11/25 16:47:02 INFO yarn.Client: Application report for application_1441038159113_0003 (state: ACCEPTED)
15/11/25 16:47:03 INFO yarn.Client: Application report for application_1441038159113_0003 (state: ACCEPTED)
15/11/25 16:47:04 INFO yarn.Client: Application report for application_1441038159113_0003 (state: ACCEPTED)
15/11/25 16:47:05 INFO yarn.Client: Application report for application_1441038159113_0003 (state: ACCEPTED)
15/11/25 16:47:06 INFO yarn.Client: Application report for application_1441038159113_0003 (state: ACCEPTED)
15/11/25 16:47:06 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as AkkaRpcEndpointRef(Actor[akka.tcp://sparkYarnAM@192.168.100.14:46652/user/YarnAM#-1250321572])
15/11/25 16:47:06 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> hd02, PROXY_URI_BASES -> http://hd02:7104/proxy/application_1441038159113_0003), /proxy/application_1441038159113_0003
15/11/25 16:47:06 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
15/11/25 16:47:07 INFO yarn.Client: Application report for application_1441038159113_0003 (state: RUNNING)
15/11/25 16:47:07 INFO yarn.Client: 
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: 192.168.100.14
	 ApplicationMaster RPC port: 0
	 queue: default
	 start time: 1448441220409
	 final status: UNDEFINED
	 tracking URL: http://hd02:7104/proxy/application_1441038159113_0003/
	 user: hadoop
15/11/25 16:47:07 INFO cluster.YarnClientSchedulerBackend: Application application_1441038159113_0003 has started running.
15/11/25 16:47:07 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 52047.
15/11/25 16:47:07 INFO netty.NettyBlockTransferService: Server created on 52047
15/11/25 16:47:07 INFO storage.BlockManagerMaster: Trying to register BlockManager
15/11/25 16:47:07 INFO storage.BlockManagerMasterEndpoint: Registering block manager 192.168.100.4:52047 with 529.9 MB RAM, BlockManagerId(driver, 192.168.100.4, 52047)
15/11/25 16:47:07 INFO storage.BlockManagerMaster: Registered BlockManager
15/11/25 16:47:07 INFO scheduler.EventLoggingListener: Logging events to file:/home/hadoop/spark/spark-eventlog/application_1441038159113_0003
15/11/25 16:47:17 INFO cluster.YarnClientSchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@gzsw-05:55796/user/Executor#-2059071929]) with ID 1
15/11/25 16:47:17 INFO storage.BlockManagerMasterEndpoint: Registering block manager gzsw-05:52897 with 2.1 GB RAM, BlockManagerId(1, gzsw-05, 52897)
15/11/25 16:47:17 INFO cluster.YarnClientSchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@gzsw-06:56733/user/Executor#261866940]) with ID 2
15/11/25 16:47:17 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
15/11/25 16:47:17 INFO storage.BlockManagerMasterEndpoint: Registering block manager gzsw-06:38994 with 2.1 GB RAM, BlockManagerId(2, gzsw-06, 38994)
15/11/25 16:47:17 INFO storage.MemoryStore: ensureFreeSpace(228640) called with curMem=0, maxMem=555684986
15/11/25 16:47:17 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 223.3 KB, free 529.7 MB)
15/11/25 16:47:17 INFO storage.MemoryStore: ensureFreeSpace(18166) called with curMem=228640, maxMem=555684986
15/11/25 16:47:17 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 17.7 KB, free 529.7 MB)
15/11/25 16:47:17 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.100.4:52047 (size: 17.7 KB, free: 529.9 MB)
15/11/25 16:47:17 INFO spark.SparkContext: Created broadcast 0 from textFile at JavaWordCount.java:49
15/11/25 16:47:17 INFO mapred.FileInputFormat: Total input paths to process : 1
15/11/25 16:47:17 INFO spark.SparkContext: Starting job: collect at JavaWordCount.java:72
15/11/25 16:47:17 INFO scheduler.DAGScheduler: Registering RDD 3 (mapToPair at JavaWordCount.java:58)
15/11/25 16:47:17 INFO scheduler.DAGScheduler: Got job 0 (collect at JavaWordCount.java:72) with 2 output partitions (allowLocal=false)
15/11/25 16:47:17 INFO scheduler.DAGScheduler: Final stage: ResultStage 1(collect at JavaWordCount.java:72)
15/11/25 16:47:17 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
15/11/25 16:47:17 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 0)
15/11/25 16:47:17 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[3] at mapToPair at JavaWordCount.java:58), which has no missing parents
15/11/25 16:47:18 INFO storage.MemoryStore: ensureFreeSpace(4736) called with curMem=246806, maxMem=555684986
15/11/25 16:47:18 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.6 KB, free 529.7 MB)
15/11/25 16:47:18 INFO storage.MemoryStore: ensureFreeSpace(2644) called with curMem=251542, maxMem=555684986
15/11/25 16:47:18 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.6 KB, free 529.7 MB)
15/11/25 16:47:18 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.100.4:52047 (size: 2.6 KB, free: 529.9 MB)
15/11/25 16:47:18 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:874
15/11/25 16:47:18 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[3] at mapToPair at JavaWordCount.java:58)
15/11/25 16:47:18 INFO cluster.YarnScheduler: Adding task set 0.0 with 2 tasks
15/11/25 16:47:18 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, gzsw-05, NODE_LOCAL, 1479 bytes)
15/11/25 16:47:18 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, gzsw-05, NODE_LOCAL, 1479 bytes)
15/11/25 16:47:19 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on gzsw-05:52897 (size: 2.6 KB, free: 2.1 GB)
15/11/25 16:47:19 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on gzsw-05:52897 (size: 17.7 KB, free: 2.1 GB)
15/11/25 16:47:20 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 2705 ms on gzsw-05 (1/2)
15/11/25 16:47:20 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 2725 ms on gzsw-05 (2/2)
15/11/25 16:47:20 INFO cluster.YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool 
15/11/25 16:47:20 INFO scheduler.DAGScheduler: ShuffleMapStage 0 (mapToPair at JavaWordCount.java:58) finished in 2.733 s
15/11/25 16:47:20 INFO scheduler.DAGScheduler: looking for newly runnable stages
15/11/25 16:47:20 INFO scheduler.DAGScheduler: running: Set()
15/11/25 16:47:20 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 1)
15/11/25 16:47:20 INFO scheduler.DAGScheduler: failed: Set()
15/11/25 16:47:20 INFO scheduler.DAGScheduler: Missing parents for ResultStage 1: List()
15/11/25 16:47:20 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (ShuffledRDD[4] at reduceByKey at JavaWordCount.java:65), which is now runnable
15/11/25 16:47:20 INFO storage.MemoryStore: ensureFreeSpace(2408) called with curMem=254186, maxMem=555684986
15/11/25 16:47:20 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.4 KB, free 529.7 MB)
15/11/25 16:47:20 INFO storage.MemoryStore: ensureFreeSpace(1459) called with curMem=256594, maxMem=555684986
15/11/25 16:47:20 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1459.0 B, free 529.7 MB)
15/11/25 16:47:20 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.100.4:52047 (size: 1459.0 B, free: 529.9 MB)
15/11/25 16:47:20 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:874
15/11/25 16:47:20 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (ShuffledRDD[4] at reduceByKey at JavaWordCount.java:65)
15/11/25 16:47:20 INFO cluster.YarnScheduler: Adding task set 1.0 with 2 tasks
15/11/25 16:47:20 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, gzsw-06, PROCESS_LOCAL, 1246 bytes)
15/11/25 16:47:20 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, gzsw-05, PROCESS_LOCAL, 1246 bytes)
15/11/25 16:47:20 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on gzsw-05:52897 (size: 1459.0 B, free: 2.1 GB)
15/11/25 16:47:20 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to gzsw-05:55796
15/11/25 16:47:20 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 147 bytes
15/11/25 16:47:20 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 98 ms on gzsw-05 (1/2)
15/11/25 16:47:22 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on gzsw-06:38994 (size: 1459.0 B, free: 2.1 GB)
15/11/25 16:47:22 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to gzsw-06:56733
15/11/25 16:47:22 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 1748 ms on gzsw-06 (2/2)
15/11/25 16:47:22 INFO scheduler.DAGScheduler: ResultStage 1 (collect at JavaWordCount.java:72) finished in 1.749 s
15/11/25 16:47:22 INFO cluster.YarnScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool 
15/11/25 16:47:22 INFO scheduler.DAGScheduler: Job 0 finished: collect at JavaWordCount.java:72, took 4.603967 s
total items 14
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
15/11/25 16:47:22 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null}
15/11/25 16:47:22 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.100.4:7106
15/11/25 16:47:22 INFO scheduler.DAGScheduler: Stopping DAGScheduler
15/11/25 16:47:22 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
15/11/25 16:47:22 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread
15/11/25 16:47:22 INFO cluster.YarnClientSchedulerBackend: Asking each executor to shut down
15/11/25 16:47:22 INFO cluster.YarnClientSchedulerBackend: Stopped
15/11/25 16:47:22 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
15/11/25 16:47:22 INFO util.Utils: path = /tmp/spark-52cdfa49-3560-40bd-9540-107f059b5d95/blockmgr-a25e4dc5-b8e0-4877-ad63-b0e32880e187, already present as root for deletion.
15/11/25 16:47:22 INFO storage.MemoryStore: MemoryStore cleared
15/11/25 16:47:22 INFO storage.BlockManager: BlockManager stopped
15/11/25 16:47:22 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
15/11/25 16:47:22 INFO spark.SparkContext: Successfully stopped SparkContext
15/11/25 16:47:22 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
15/11/25 16:47:22 INFO util.Utils: Shutdown hook called
15/11/25 16:47:22 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
15/11/25 16:47:22 INFO util.Utils: Deleting directory /tmp/spark-52cdfa49-3560-40bd-9540-107f059b5d95
15/11/25 16:47:22 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.

 

  • 大小: 74.5 KB
0
1
分享到:
评论

相关推荐

    spark-sql sql on yarn -deploy-mode cluster 改造

    spark-sql sql on yarn --deploy-mode cluster 改造为 可以cluster提交

    spark-2.2.0-yarn-shuffle.jar

    spark-2.2.0-yarn-shuffle.jar

    spark-1.6.1-yarn-shuffle.jar

    spark-1.6.1-yarn-shuffle.jar 下载。spark-1.6.1-yarn-shuffle.jar 下载。spark-1.6.1-yarn-shuffle.jar 下载。

    spark-3.1.2.tgz & spark-3.1.2-bin-hadoop2.7.tgz.rar

    Spark可以与Hadoop生态系统无缝集成,利用HDFS作为数据源,并且可以在YARN上运行。 4. 压缩包内容: - spark-3.1.2.tgz:这是一个tar归档文件,经过gzip压缩,通常包含源代码、文档、配置文件和编译后的二进制文件...

    spark-3.1.3-bin-hadoop3.2.tgz

    5. 集成与兼容性:预编译的Hadoop 3.2版本意味着Spark 3.1.3可以更好地与Hadoop生态系统中的其他组件(如HDFS、YARN)协同工作,提供更广泛的数据源支持。 6. 开发者工具:Spark提供了一个强大的交互式命令行界面...

    spark-3.2.2-bin-3.0.0-cdh6.3.2

    内容概要:由于cdh6.3.2的spark版本为2.4.0,并且spark-sql被阉割,现基于cdh6.3.2,scala2.12.0,java1.8,maven3.6.3,,对spark-3.2.2源码进行编译 应用:该资源可用于cdh6.3.2集群配置spark客户端,用于spark-sql

    spark-2.4.7-bin-hadoop2.6.tgz

    与Hadoop2.6版本兼容,意味着Spark可以充分利用Hadoop的分布式存储系统HDFS和YARN资源管理器进行大数据处理。 Spark的核心特性包括: 1. **弹性分布式数据集(RDD)**:RDD是Spark的基本数据抽象,它是不可变、分区...

    SPARK2_ON_YARN-2.4.0.cloudera2.jar

    SPARK2_ON_YARN-2.4.0 jar包下载

    spark-1.6.0-bin-hadoop2.6.tgz

    - 如果需要运行在Hadoop YARN上,还需要配置`yarn-site.xml`和`core-site.xml`等相关Hadoop配置文件。 - 启动Spark相关服务,如Master和Worker节点。 **4. 使用Spark Shell** Spark提供了一个交互式的Shell,可以...

    spark-3.2.1-bin-hadoop2.7.tgz

    接着,可以启动Spark的独立模式或者与YARN、Mesos等集群管理器结合的集群模式。在开发应用程序时,可以使用Scala、Java、Python或R语言的Spark API,编写分布式数据处理代码。 Spark支持多种数据源,包括HDFS、...

    spark-3.1.2-bin-hadoop2.7.tgz

    为了运行Spark,你需要有一个运行的Hadoop环境,因为Spark依赖Hadoop的YARN资源管理系统。如果你的集群已经配置好了Hadoop,那么可以通过YARN提交Spark作业。如果只是本地测试,可以使用Spark的独立模式,通过`spark...

    spark-yarn_2.11-2.1.3-SNAPSHOT.jar

    spark-yarn_2.11-2.1.3-SNAPSHOT.jar

    spark-2.2.0-bin-hadoop2.6.tgz

    Spark on YARN允许Spark应用程序在Hadoop集群上运行,利用YARN进行任务调度和数据存储管理。这种模式下,Spark运行在YARN的容器中,而不是在独立的集群模式下,这使得Spark可以无缝地集成到现有的Hadoop环境中。 在...

    spark-3.0.0-bin-hadoop3.2

    同时,理解Hadoop生态系统的其他组件,如HDFS和YARN,将有助于更好地集成和管理Spark作业。 总的来说,Spark 3.0.0-bin-hadoop3.2是一个强大且灵活的大数据处理工具,适用于Windows平台,为开发者提供了高效的数据...

    spark-1.6.0-bin-hadoop2.4.tgz

    在Hadoop 2.4的环境中,Spark能够利用Hadoop的分布式存储系统HDFS和资源管理系统YARN,实现数据的读取和计算。Hadoop 2.4引入了YARN(Yet Another Resource Negotiator),作为新的资源管理器,取代了原来的...

    spark-2.1.1-bin-hadoop2.7.tgz.7z

    此外,如果你打算在Hadoop YARN上运行Spark,还需要正确配置Hadoop的客户端环境。 在实际应用中,Spark可以通过编程接口(API)与多种数据源交互,如HDFS、Cassandra、HBase、Amazon S3等。它的RDD(弹性分布式数据...

    spark-2.4.0-bin-hadoop2.6.tgz

    Spark的工作模式可以是本地模式、standalone模式、YARN模式或Mesos模式,其中YARN模式就是在Hadoop 2.6环境下运行Spark的方式。 3. **Hadoop集成**:Spark-2.4.0-bin-hadoop2.6.tgz表明这个版本的Spark已经预编译了...

    (word完整版)windows下非submit-方式运行spark-on-yarn(CDH集群).doc

    为此,需要下载SPARK2_ON_YARN-2.2.0.cloudera1.jar,并将其放在Cloudera Manager节点的`/opt/cloudera/csd/`目录下,确保文件所有者和组为`cloudera-scm:cloudera-scm`。 - 重启`cloudera-scm-server`服务,然后在...

    spark-assembly-1.3.0-hadoop2.5.0-cdh5.3.0.jar的下载地址和提取码

    spark-assembly-1.3.0-hadoop2.5.0-cdh5.3.0.jar的下载地址和提取码

Global site tag (gtag.js) - Google Analytics