I am using EMR 4.1.0 + spark 1.5.0 + YARN to process big data. I am trying to utilize full cluster but some how YARN is not allocating all the resources.
- Using 4 X c3.8xlarge EC2 slave nodes (each 60.0 GB Memory and 32 cores)
- According to this article I have set following parameters in EMR cluster
yarn.nodemanager.resource.memory-mb -> 53856 yarn.nodemanager.resource.cpu-vcores -> 26 yarn.scheduler.capacity.resource-calculator -> org.apache.hadoop.yarn.util.resource.DominantResourceCalculator (so yarn can manage both memory and cores)
Then I started pyspark with pyspark --master yarn-client --num-executors 24 --executor-memory 8347m --executor-cores 4
But RM UI shows following
It allocates only 21 containers vs requested 24 27 GB reserved memory and 12 reserved core can be used to allocate more 3 containers. right?
What am I missing here?
Thank You!