I seem to have just answered a very similar question.
Think about a use case where you don't want to wait for all the resources available and start as soon as the number is just enough to start tasks on.
That's why Spark on YARN has an extra check (aka minRegisteredRatio
) that's the minimum of 80% of cores requested before the application starts executing tasks.
Since you want to have all the cores available before a Spark application starts, use spark.scheduler.minRegisteredResourcesRatio
Spark property to control the ratio.
Quoting the official Spark documentation (highlighting mine):
spark.scheduler.minRegisteredResourcesRatio
0.8 for YARN mode
The minimum ratio of registered resources (registered resources / total expected resources) (resources are executors in yarn mode, CPU cores in standalone mode and Mesos coarsed-grained mode ['spark.cores.max' value is total expected resources for Mesos coarse-grained mode] ) to wait for before scheduling begins. Specified as a double between 0.0 and 1.0. Regardless of whether the minimum ratio of resources has been reached, the maximum amount of time it will wait before scheduling begins is controlled by config spark.scheduler.maxRegisteredResourcesWaitingTime
.