I have a GCP Dataproc cluster with 50 workers (n1-standard-16 16 VCores 64 GB RAM).
The cluster has Capacity Scheduler with Default Resource Calculator.
My Spark job has following configuration
- spark.executor.cores=5
- spark.executor.memory=18G
- spark.yarn.executor.memoryOverhead=2G
Now when I see YARN UI it shows that each node has 2 containers running with 1-Vcore and 20GB RAM, which almost make it look like that spark.executor.cores
is not getting applies. To crosscheck I looked at Spark UI and to my surprise every executor showed 5 cores. This is a bit confusing to me.
Also the job completion time (26 mins) also indicates that those 5 cores are indeed vcores not just 5 threads inside 1 core (this is just my understanding, I might be completely wrong here).
Can anyone help me in understanding this ?