Lets say we have a spark job running in Cluster Mode where the cluster manager is YARN.
In cluster mode
- a user submits a pre-compiled JAR, Python script to the cluster manager. The cluster manager than tells a specific Node Manager to launch the Application Master.
- The Spark Driver then runs on the Application Master. The driver converts the users code containing transformations and actions into a logical plan called the DAG. The DAG is then converted into a physical execution plan
- The application master then communicates with the cluster manager and negotiates resources. Resources such as preferred executor locations, and number of containers are requested.
At this point does the cluster manager allocate YARN containers or does the application master allocate the YARN containers? Does the cluster manager create the Spark Executors as well or does the application master do this?