0
votes

In MRv1 we had the below two configurable parameters to set the number of Map and reduce slots per Node.

mapred.tasktracker.map.tasks.maximum
mapred.tasktracker.reduce.tasks.maximum

Also it was advisable to have number of Map slots little higher than the number of Reduce slots. Ideal number of reducers for a Map Reduce job would be equal to or greater than number of reduce slots available in the cluster.

Please correct if my above understanding is not correct wrt MRv1...

In MRv2 we dont have the concept of slots anymore, instead containers provide the required memory and CPU for Map/Reduce taks execution.

Here comes my question, How to decide on number of reducers for any Map Reduce job in MRv2 ?

Thanks

1

1 Answers

0
votes

mapred.tasktracker.reduce.tasks.maximum is replaced by

mapreduce.tasktracker.reduce.tasks.maximum

This property denotes the maximum number of concurrent reduce slots a given task tracker node can run.

mapred.tasktracker.map.tasks.maximum is replaceb by

mapreduce.tasktracker.map.tasks.maximum

This property denotes the maximum number of concurrent map slots a given task tracker node can run.

With YARN and MapReduce 2, there are no longer pre-configured static slots for Map and Reduce tasks. The entire cluster is available for dynamic resource allocation of Maps and Reduces as needed by the job.

But If you want to assign number of reducer to your job, you can still do it by specifying following property in your Map/Reduce job.

mapreduce.job.reduces

Please see this link to know more about it.

Number of Mapper is basically allocated based on number of input split of your data. Suppose you are dealing with 1GB data-set and HDFS block size is 128MB and you have not specified any split size in your job then 1GB/128MB=8 split will be considered and 8 Mapper will beallocated to this job but suppose if you have specified split size 512MB in your code then 1GB/512MB=2mapper will be considered and allocated to thisjob.

Please see this link to understand more about it.