2
votes

How does spark choose nodes to run executors?(spark on yarn) We use spark on yarn mode, with a cluster of 120 nodes. Yesterday one spark job create 200 executors, while 11 executors on node1, 10 executors on node2, and other executors distributed equally on the other nodes.

Since there are so many executors on node1 and node2, the job run slowly.

How does spark select the node to run executors? according to yarn resourceManager?

3

3 Answers

1
votes

As you mentioned Spark on Yarn: Yarn Services choose executor nodes for spark job based on the availability of the cluster resource. Please check queue system and dynamic allocation of Yarn. the best documentation https://blog.cloudera.com/blog/2016/01/untangling-apache-hadoop-yarn-part-3/

0
votes

Cluster Manager allocates resources across the other applications. I think the issue is with bad optimized configuration. You need to configure Spark on the Dynamic Allocation. In this case Spark will analyze cluster resources and add changes to optimize work.

You can find all information about Spark resource allocation and how to configure it here: http://site.clairvoyantsoft.com/understanding-resource-allocation-configurations-spark-application/

0
votes

Are all 120 nodes having identical capacity?

Moreover the jobs will be submitted to a suitable node manager based on the health and resource availability of the node manager.

To optimise spark job, You can use dynamic resource allocation, where you do not need to define the number of executors required for running a job. By default it runs the application with the configured minimum cpu and memory. Later it acquires resource from the cluster for executing tasks. It will release the resources to the cluster manager once the job has completed and if the job is idle up to the configured idle timeout value. It reclaims the resources from the cluster once it starts again.