2
votes

In Apache Spark, how does spark-submit.sh work with different modes and different cluster managers? Specifically:

In local deployment mode,

  • does spark-submit.sh skip calling any cluster manager?
  • Is it correct that there is no need to install a cluster manager on the local machine?

In client or cluster deployment mode,

  • Does spark-submit.sh work with different cluster managers (Spark standalone, YARN, Mesos, Kubernetes)? Do different cluster managers have different interfaces, and spark-submit.sh has to invoke them in different ways?

  • Does spark-submit.sh appear to programmers the same interface except --master? option --master of spark-submit.sh is used to specify a cluster manager.

Thanks.

1

1 Answers

1
votes

To make things clear, there is absolutely no need to specify any cluster manager while running spark on any mode (client or cluster or whether you run spark in local mode). The cluster manager is only there to make resource allocation easier and independent, but it is always your choice to use one or not.

The spark-submit command doesn't need a cluster manager present to run.

The different ways in which you can use the command are:

1) local mode:

./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master local[8] \
  /path/to/examples.jar \
  100

2) client mode without a resource manager (also known as spark standalone mode):

./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master spark://207.184.161.138:7077 \
  --executor-memory 20G \
  --total-executor-cores 100 \
  /path/to/examples.jar \
  1000

3) cluster mode with spark standalone mode:

./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master spark://207.184.161.138:7077 \
  --deploy-mode cluster \
  --supervise \
  --executor-memory 20G \
  --total-executor-cores 100 \
  /path/to/examples.jar \
  1000

4) Client/Cluster mode with a resource manager:

./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master yarn \
  --deploy-mode cluster \  # can be client for client mode
  --executor-memory 20G \
  --num-executors 50 \
  /path/to/examples.jar \
  1000

As you can see above, the spark-submit.sh will behave in the same way whether there is a cluster manager or not. Also, if you want to use a resource manager like yarn, mesos, the behaviour of spark-submit will remain the same. You can read more about spark-submit here.