1
votes

I am running a number of Spark jobs in parallel on a YARN cluster. I am finding that YARN is starting up a number of these jobs in parallel, but only allocating one container for the driver and no executors. This means that these Spark jobs are effectively sitting idle waiting for an executor to join, when this processing power could be better utilised by allocating executors to other jobs.

I would like to configure YARN to allocate a minimum of two containers (one driver + one executor) to a job, and if that's not available to keep it in the queue. How can I configure YARN in this way?

(I am running on an AWS EMR cluster with nearly all of the default settings.)

1

1 Answers

1
votes

If your YARN uses FairScheduler, you can limit the number of applications running concurrently, and what percentage of a pool can be used by AMs (leaving the rest to the executors):

maxRunningApps: limit the number of apps from the queue to run at once

maxAMShare: limit the fraction of the queue’s fair share that can be used to run application masters. This property can only be used for leaf queues. For example, if set to 1.0f, then AMs in the leaf queue can take up to 100% of both the memory and CPU fair share. The value of -1.0f will disable this feature and the amShare will not be checked. The default value is 0.5f.