0
votes

I am new to Spark and learning the architecture. I understood that spark supports 3 cluster managers such as YARN, Standalone and Mesos.

In yarn cluster mode, Spark driver resides in Resource manager and executors in yarn's Containers of Node manager.

In standalone cluster mode Spark driver resides in master process and executors in slave process.

If my understanding is correct then is it required to install spark on all the node Mangers of Yarn cluster , slave nodes of standalone cluster

1

1 Answers

3
votes

If you use yarn as manager on a cluster with multiple nodes you do not need to install spark on each node. Yarn will distribute the spark binaries to the nodes when a job is submitted.

https://spark.apache.org/docs/latest/running-on-yarn.html

Running Spark on YARN requires a binary distribution of Spark which is built with YARN support. Binary distributions can be downloaded from the downloads page of the project website. To build Spark yourself, refer to Building Spark.

To make Spark runtime jars accessible from YARN side, you can specify spark.yarn.archive or spark.yarn.jars. For details please refer to Spark Properties. If neither spark.yarn.archive nor spark.yarn.jars is specified, Spark will create a zip file with all jars under $SPARK_HOME/jars and upload it to the distributed cache.