Why does dataproc not create two executors per worker when spark.yarn.executor.memoryOverhead is configured?

Question

Dataproc is supposed to fit in two Executors per worker (or yarn NodeManager) with each one getting half the cores and half the memory. And it does work that way.

However, if we override a setting, say spark.yarn.executor.memoryOverhead=4096

then it only creates one Executor per worker. Half the cores and memory of the clusters are not utilized. And no matter how we play around with spark.executor.memory or spark.executor.cores, it still doesn't spin up enough executors to utilize all cluster resources.

How to make dataproc still create 2 executors per worker? The yarn overhead is deducted out of the executor memory, so it should still be able to fit in 2 executors, shouldn't it?

Angus Davis Angus Davis · Accepted Answer · 2017-05-10T00:01:59

When executing in YARN, Spark will request containers with memory sized as spark.executor.memory + spark.yarn.executor.memoryOverhead. If you're adding to memoryOverhead, you will want to subtract an equal amount from spark.executor.memory to preserve the same container packing characteristics.

Why does dataproc not create two executors per worker when spark.yarn.executor.memoryOverhead is configured?

1 Answers