spark 提交任务参数说明

daizj

浏览: 792263 次
性别:
来自: 广州

最近访客更多访客>>

guwq2014

snowolf

junes_yu

yuanyuan7891

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

spark

spark spark-submit 提交任务参数说明

1.参数选取

当我们的代码写完，打好jar，就可以通过bin/spark-submit 提交到集群，命令如下：

./bin/spark-submit \
--class <main-class>
--master <master-url> \
--deploy-mode <deploy-mode> \
--conf <key>=<value> \
     ... # other options
<application-jar> \
[application-arguments]
一般情况下使用上面这几个参数就够用了

--class: The entry point for your application (e.g. org.apache.spark.examples.SparkPi)

--master: The master URL for the cluster (e.g. spark://23.195.26.187:7077)

--deploy-mode: Whether to deploy your driver on the worker nodes (cluster) or locally as an external client (client) (default: client) †

--conf: Arbitrary Spark configuration property in key=value format. For values that contain spaces wrap “key=value” in quotes (as shown).

application-jar: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an hdfs:// path or a file:// path that is present on all nodes.

application-arguments: Arguments passed to the main method of your main class, if any

对于不同的集群管理，对spark-submit的提交列举几个简单的例子

# Run application locally on 8 cores

./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master local[8] \
/path/to/examples.jar \
100

# Run on a Spark standalone cluster in client deploy mode

./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://207.184.161.138:7077 \
--executor-memory 20G \
--total-executor-cores 100 \
/path/to/examples.jar \
1000

# Run on a Spark standalone cluster in cluster deploy mode with supervise
# make sure that the driver is automatically restarted if it fails with non-zero exit code

./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://207.184.161.138:7077 \
--deploy-mode cluster
--supervise
--executor-memory 20G \
--total-executor-cores 100 \
/path/to/examples.jar \
   1000

# Run on a YARN cluster export HADOOP_CONF_DIR=XXX

./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn-cluster \ # can also be `yarn-client` for client mode
--executor-memory 20G \
--num-executors 50 \
/path/to/examples.jar \
1000

# Run a Python application on a Spark standalone cluster

./bin/spark-submit \
--master spark://207.184.161.138:7077 \
examples/src/main/python/pi.py \
1000
2.具体提交步骤

代码实现一个简单的统计

public class SimpleSample {
public static void main(String[] args) {
String logFile = "/home/bigdata/spark-1.5.1/README.md";
SparkConf conf = new SparkConf().setAppName("Simple Application");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<String> logData = sc.textFile(logFile).cache();

long numAs = logData.filter(new Function<String, Boolean>() {
public Boolean call(String s) {
return s.contains("a");
}
}).count();

long numBs = logData.filter(new Function<String, Boolean>() {
public Boolean call(String s) {
return s.contains("b");
}
}).count();

System.out.println("Lines with a: " + numAs + ", lines with b: " + numBs);
}

}
打成jar

上传命令

./bin/spark-submit --class cs.spark.SimpleSample --master spark://spark1:7077 /home/jar/spark-test-0.0.1-SN

本文转自：https://my.oschina.net/u/2529303/blog/541685

分享到：

ERROR: Couldn't open transport for host. ... | 查询服务器外网IP和根据域名查询外网IP

2017-04-28 14:32
浏览 2973
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论