I am running Spark with YARN.
From the link: http://spark.apache.org/docs/latest/running-on-yarn.html
I found explanation of different yarn modes, i.e. the --master option, with which Spark can run:
"There are two deploy modes that can be used to launch Spark applications on YARN. In yarn-cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In yarn-client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN"
Hereby, I can only understand the difference is that where the driver is running, but I can not understand which is running faster. Morevover:
- In case of running Spark-submit, the --master can be either client or cluster
- Correspondingly Spark-shell's master option can be yarn-client but it does not support cluster mode
So I do not know how to make the choice, i.e. when to use spark-shell, when to use spark-submit, especially when to use client mode, when to use cluster mode