We are observing some strange behaviour concerning number of executors and tasks on dataproc. Our understanding is that (in theory) the number of cores available in the cluster will limit the number of tasks that can run in parallel: 32 cores means a maximum number of 32 tasks. However, in dataproc we often observe some different behavior, basically double number of theoretically possible concurrent tasks. Here is an example:
Running dataproc cluster with 12+1(master) n1-standard-4 machines. This gives 48 available vcores with 15GB RAM per machine. We launch our spark app with
spark.executor.cores = 4
... which should give us 12 executors, each being able to run 4 tasks in parallel, i.e. 48 parallel tasks, while underutilising memory, as dataproc will automatically assign spark.executor.memory = 5586m
. However, what actually happens is that we seem to end up with 24 executors, running a total of 92 tasks in parallel, and hence being (almost) twice as fast. And we don't understand why.
YARN monitor also tells us that there are 24 containers, although there should be 12 (with 4 cores each).