1
votes

I am using EMR 4.1.0 + spark 1.5.0 + YARN to process big data. I am trying to utilize full cluster but some how YARN is not allocating all the resources.

  • Using 4 X c3.8xlarge EC2 slave nodes (each 60.0 GB Memory and 32 cores)
  • According to this article I have set following parameters in EMR cluster

yarn.nodemanager.resource.memory-mb -> 53856 yarn.nodemanager.resource.cpu-vcores -> 26 yarn.scheduler.capacity.resource-calculator -> org.apache.hadoop.yarn.util.resource.DominantResourceCalculator (so yarn can manage both memory and cores)

Then I started pyspark with pyspark --master yarn-client --num-executors 24 --executor-memory 8347m --executor-cores 4

But RM UI shows following

enter image description here

It allocates only 21 containers vs requested 24 27 GB reserved memory and 12 reserved core can be used to allocate more 3 containers. right?

What am I missing here?

Thank You!

1

1 Answers

0
votes

From here, it looks like your base should be 53248M. Additionally, there is a 10% memory overhead that must be accounted for (spark.yarn.executor.memoryOverhead). 53248*.9 = 47932M that can be allocated on each node. If you allocate 8347M for each executor, each node can only contain 5 of them. 47932 - 5* 8347 = 6197M, which is not enough free memory to launch a 6th executor. The last 3 executors (one for each node) are not launching because there is not enough memory for them to launch. If you want to have 24 containers, launch with --executor-memory 7987M

Note, if you will have 6 unused cores/node if you use this configuration. This spreadsheet could help you find the best configurations for any type/size of cluster

https://docs.google.com/spreadsheets/d/1VH7Qly308hoRPu5VoLIg0ceolrzen-nBktRFkXHRrY4/edit#gid=1524766257