22
votes

I have a Hadoop cluster with 5 nodes, each of which has 12 cores with 32GB memory. I use YARN as MapReduce framework, so I have the following settings with YARN:

  • yarn.nodemanager.resource.cpu-vcores=10
  • yarn.nodemanager.resource.memory-mb=26100

Then the cluster metrics shown on my YARN cluster page (http://myhost:8088/cluster/apps) displayed that VCores Total is 40. This is pretty fine!

Then I installed Spark on top of it and use spark-shell in yarn-client mode.

I ran one Spark job with the following configuration:

  • --driver-memory 20480m
  • --executor-memory 20000m
  • --num-executors 4
  • --executor-cores 10
  • --conf spark.yarn.am.cores=2
  • --conf spark.yarn.executor.memoryOverhead=5600

I set --executor-cores as 10, --num-executors as 4, so logically, there should be totally 40 Vcores Used. However, when I check the same YARN cluster page after the Spark job started running, there are only 4 Vcores Used, and 4 Vcores Total

I also found that there is a parameter in capacity-scheduler.xml - called yarn.scheduler.capacity.resource-calculator:

"The ResourceCalculator implementation to be used to compare Resources in the scheduler. The default i.e. DefaultResourceCalculator only uses Memory while DominantResourceCalculator uses dominant-resource to compare multi-dimensional resources such as Memory, CPU etc."

I then changed that value to DominantResourceCalculator.

But then when I restarted YARN and run the same Spark application, I still got the same result, say the cluster metrics still told that VCores used is 4! I also checked the CPU and memory usage on each node with htop command, I found that none of the nodes had all 10 CPU cores fully occupied. What can be the reason?

I tried also to run the same Spark job in fine-grained way, say with --num executors 40 --executor-cores 1, in this ways I checked again the CPU status on each worker node, and all CPU cores are fully occupied.

3
Could you check on the Spark UI website (tab Environment) that all the config options were really propagated to the Spark app? You can also check YARN Resource Manager logs if there is any problem with allocation.vanekjar
Have you ever solved this issue? I'm running into the same problem right now.Mestre San

3 Answers

3
votes

I was wondering the same but changing the resource-calculator worked for me.
This is how I set the property:

    <property>
        <name>yarn.scheduler.capacity.resource-calculator</name>      
        <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>       
    </property>

Check in the YARN UI in the application how many containers and vcores are assigned, with the change the number of containers should be executors+1 and the vcores should be: (executor-cores*num-executors) +1.

2
votes

Without setting the YARN scheduler to FairScheduler, I saw the same thing. The Spark UI showed the right number of tasks, though, suggesting nothing was wrong. My cluster showed close to 100% CPU usage, which confirmed this.

After setting FairScheduler, the YARN Resources looked correct.

0
votes

Executors take 10 cores each, 2 cores for Application Master = 42 Cores requested when you have 40 vCores total.

Reduce executor cores to 8 and make sure to restart each NodeManager

Also modify yarn-site.xml and set these properties:

yarn.scheduler.minimum-allocation-mb
yarn.scheduler.maximum-allocation-mb
yarn.scheduler.minimum-allocation-vcores
yarn.scheduler.maximum-allocation-vcores