unreasonable yarn cluster metrics

Question

I have been using spark and yarn for quite a while, and mostly have a handle on all the spark-submit parameter. I am currently using a 5 node EMR cluster, 1 master and 4 workers, all M3.xlarge, which is spec'ed at 4 vCores. (Actually, when I ssh into the machines and checked, there were actually 3 cores only.)

however, when I spark-submit a job into the emr

spark-submit --master yarn --class myclass --num-executors 9 --executor-cores 2 --executor-memory 500M my.jar

The yarn console always shows that I have 32 vCores total, and 4 vCores used, and the number of active nodes is 4.

So this total number of vCores is a real mystery. How can there be 32 vCores? Even if you count the master node, there are 5 * 4 vCores = 20. Not counting the master node, the active worker nodes is indeed 4. That would make the total vCore count to 16, not 32. Anyone can explain this?

Joe Widen Joe Widen · Accepted Answer · 2016-02-18T03:42:48

The hardware you are running on uses hyperthreading technology. This allows each physical core to work as two virtual cores. Your four worker machines have 4 physical cores, but that actually corresponds to 8 virtual cores.

See: https://aws.amazon.com/ec2/instance-types/

and https://en.wikipedia.org/wiki/Hyper-threading

unreasonable yarn cluster metrics

1 Answers