Hadoop MapReduce: Clarification on number of reducers

Question

In the MapReduce framework, one reducer is used for each key generated by the mapper.

So you would think that specifying the number of Reducers in Hadoop MapReduce wouldn't make any sense because it's dependent on the program. However, Hadoop allows you to specify the number of reducers to use (-D mapred.reduce.tasks=# of reducers).

What does this mean? Is the parameter value for number of reducers specifying how many machine resources go to the reducers instead of the number of actual reducers used?

Judge Mental Judge Mental · Accepted Answer · 2014-03-12T19:13:14

one reducer is used for each key generated by the mapper

This comment is not correct. One call to the reduce() method is done for each key grouped by the grouping comparator. A reducer (task) is a process that handles zero or more calls to reduce(). The property to which you refer is talking about the number of reducer tasks.

Hadoop MapReduce: Clarification on number of reducers

2 Answers