I have been using spark and yarn for quite a while, and mostly have a handle on all the spark-submit parameter. I am currently using a 5 node EMR cluster, 1 master and 4 workers, all M3.xlarge, which is spec'ed at 4 vCores. (Actually, when I ssh into the machines and checked, there were actually 3 cores only.)
however, when I spark-submit a job into the emr
spark-submit --master yarn --class myclass --num-executors 9 --executor-cores 2 --executor-memory 500M my.jar
The yarn console always shows that I have 32 vCores total, and 4 vCores used, and the number of active nodes is 4.
So this total number of vCores is a real mystery. How can there be 32 vCores? Even if you count the master node, there are 5 * 4 vCores = 20. Not counting the master node, the active worker nodes is indeed 4. That would make the total vCore count to 16, not 32. Anyone can explain this?