0
votes

I have been using spark and yarn for quite a while, and mostly have a handle on all the spark-submit parameter. I am currently using a 5 node EMR cluster, 1 master and 4 workers, all M3.xlarge, which is spec'ed at 4 vCores. (Actually, when I ssh into the machines and checked, there were actually 3 cores only.)

however, when I spark-submit a job into the emr

spark-submit --master yarn --class myclass --num-executors 9 --executor-cores 2 --executor-memory 500M my.jar 

The yarn console always shows that I have 32 vCores total, and 4 vCores used, and the number of active nodes is 4.

So this total number of vCores is a real mystery. How can there be 32 vCores? Even if you count the master node, there are 5 * 4 vCores = 20. Not counting the master node, the active worker nodes is indeed 4. That would make the total vCore count to 16, not 32. Anyone can explain this?

1

1 Answers

1
votes

The hardware you are running on uses hyperthreading technology. This allows each physical core to work as two virtual cores. Your four worker machines have 4 physical cores, but that actually corresponds to 8 virtual cores.

See: https://aws.amazon.com/ec2/instance-types/

and https://en.wikipedia.org/wiki/Hyper-threading