5
votes

In SPARK-SUBMIT , what is the difference between "yarn" , "yarn-cluster" , "yarn-client" deploy modes ?

./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master yarn-cluster \  # can also be `yarn-client` for client mode
  --executor-memory 20G \
  --num-executors 50 \
  /path/to/examples.jar \
  1000

https://spark.apache.org/docs/1.1.0/submitting-applications.html

1

1 Answers

13
votes

For Spark on YARN, you can specify either yarn-client or yarn-cluster. Yarn-client runs driver program in the same JVM as spark submit, while yarn-cluster runs Spark driver in one of NodeManager's container.

From the documentation: https://spark.apache.org/docs/1.1.0/running-on-yarn.html There are two deploy modes that can be used to launch Spark applications on YARN. In yarn-cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In yarn-client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.