EMR 4.1.0 + Spark 1.5.0 + YARN Resource Allocation

Question

I am using EMR 4.1.0 + spark 1.5.0 + YARN to process big data. I am trying to utilize full cluster but some how YARN is not allocating all the resources.

Using 4 X c3.8xlarge EC2 slave nodes (each 60.0 GB Memory and 32 cores)
According to this article I have set following parameters in EMR cluster

yarn.nodemanager.resource.memory-mb -> 53856 yarn.nodemanager.resource.cpu-vcores -> 26 yarn.scheduler.capacity.resource-calculator -> org.apache.hadoop.yarn.util.resource.DominantResourceCalculator (so yarn can manage both memory and cores)

Then I started pyspark with pyspark --master yarn-client --num-executors 24 --executor-memory 8347m --executor-cores 4

But RM UI shows following

It allocates only 21 containers vs requested 24 27 GB reserved memory and 12 reserved core can be used to allocate more 3 containers. right?

What am I missing here?

Thank You!

David David · Accepted Answer · 2016-03-28T20:24:21

From here, it looks like your base should be 53248M. Additionally, there is a 10% memory overhead that must be accounted for (spark.yarn.executor.memoryOverhead). 53248*.9 = 47932M that can be allocated on each node. If you allocate 8347M for each executor, each node can only contain 5 of them. 47932 - 5* 8347 = 6197M, which is not enough free memory to launch a 6th executor. The last 3 executors (one for each node) are not launching because there is not enough memory for them to launch. If you want to have 24 containers, launch with --executor-memory 7987M

Note, if you will have 6 unused cores/node if you use this configuration. This spreadsheet could help you find the best configurations for any type/size of cluster

https://docs.google.com/spreadsheets/d/1VH7Qly308hoRPu5VoLIg0ceolrzen-nBktRFkXHRrY4/edit#gid=1524766257

EMR 4.1.0 + Spark 1.5.0 + YARN Resource Allocation

1 Answers