Limit YARN containers programmatically

Question

I have 10 nodes in hadoop cluster with 32GB RAM and one with 64GB.

For these 10 nodes node limit yarn.nodemanager.resource.memory-mb is set to 26GB and for 64GB node to 52GB (have some jobs that require 50GB for single reducer, they run on this node)

The problem is, when I run basic jobs that require say 8GB for mapper, 32GB nodes spawn 3 mappers in parallel (26 / 8 = 3) and 64GB node spawns 6 mappers. This node usually finishes last, because of CPU load.

I'd like to limit job container resources programmatically, e.g. set container limit to 26GB for most of the jobs. How can it be done?

SachinJ SachinJ · Accepted Answer · 2018-01-12T20:24:46

First of all yarn.nodemanager.resource.memory-mb (Memory) , yarn.nodemanager.resource.cpu-vcores (vcore) are Nodemanager daemon/service configuration properties and cannot be overriden in the YARN client applications. You need to restart nodemanager services if you change these configuration properties.

Since CPU is the bottleneck in your case, My recommendation is to change the YARN scheduling strategy to Fairscheduler with DRF (Dominant Resource Fairness) scheduling policy in the cluster level so that you will get the flexibility to specify application container size in terms of both memory and cpu core. Number of running application containers(mapper/reducer/AM/tasks) will be based on the available vcores that you define

Scheduling policy can be set at the Fair scheduler queue/pool level.

schedulingPolicy: to set the scheduling policy of any queue. The allowed values are “fifo”/“fair”/“drf”

See this apache doc for more details -

Once you have created new Fair scheduler queue/pool with DRF scheduling policy, both memory can cpu core can be set in the program as follows.

Configuration conf = new Configuration();

How to define container size in a mapreduce application.

Configuration conf = new Configuration();

conf.set("mapreduce.map.memory.mb","4096");
conf.set(mapreduce.reduce.memory.mb","4096");

conf.set(mapreduce.map.cpu.vcores","1");
conf.set(mapreduce.reduce.cpu.vcores","1");

Reference - https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

Default value of cpu.vcores allocation for mapper/reducer will be 1, You can increase this value if it's a cpu intensive application. Remember If you increase this value, number of mapper/reducer tasks running in parallel also will be reduced.

Limit YARN containers programmatically

2 Answers