Why is the number of cores for driver and executors on YARN different from the number requested?

Question

I deploy a spark job in the cluster mode with the following

Driver core - 1
Executor cores - 2
Number of executors - 2.

My understanding is that this application should occupy 5 cores in the cluster (4 executor cores and 1 driver core) but i dont observe this in the RM and Spark UIs.

On the resource manager UI, i see only 4 cores used for this application.

Even in Spark UI (on click of ApplicationMaster URL from RM), under the executors tab, the driver cores is shown as zero.

Am i missing something?

The cluster manager is YARN.

I suggest reading this answer regarding how spark's cluster mode works. Also here is the link to the official docs regarding cluster mode. Reading both of those will give you a better understanding of the architecture. — Jeremy
This depends on your YARN configuration too; SPARK-6050 has a lot of discussion about it. — vanza

Jacek Laskowski Jacek Laskowski · Accepted Answer · 2017-12-21T20:51:40

My understanding is that this application should occupy 5 cores in the cluster (4 executor cores and 1 driver core)

That's the perfect situation in YARN where it could give you 5 cores off the CPUs it manages.

but i dont observe this in the RM and Spark UIs.

Since the perfect situation does not occur often something it's nice to have as many cores as we could get from YARN so the Spark application could ever start.

Spark could just wait indefinitely for the requested cores, but that could not always be to your liking, could it?

That's why Spark on YARN has an extra check (aka minRegisteredRatio) that's the minimum of 80% of cores requested before the application starts executing tasks. You can use spark.scheduler.minRegisteredResourcesRatio Spark property to control the ratio. That would explain why you see less cores in use than requested.

Quoting the official Spark documentation (highlighting mine):

spark.scheduler.minRegisteredResourcesRatio

0.8 for YARN mode

The minimum ratio of registered resources (registered resources / total expected resources) (resources are executors in yarn mode, CPU cores in standalone mode and Mesos coarsed-grained mode ['spark.cores.max' value is total expected resources for Mesos coarse-grained mode] ) to wait for before scheduling begins. Specified as a double between 0.0 and 1.0. Regardless of whether the minimum ratio of resources has been reached, the maximum amount of time it will wait before scheduling begins is controlled by config spark.scheduler.maxRegisteredResourcesWaitingTime.

Why is the number of cores for driver and executors on YARN different from the number requested?

1 Answers