Why reducer has different Input/Output key, value in hadoop map/reduce?

Question

Because of nature of Map/Reduce applications, reduce function may be called more than once, so the Input/Output key value must be same like Map/Reduce implementation of MongoDB. I Wonder why in Hadoop implementation it is different?(I'd better say it is allowed to be different)

org.apache.hadoop.mapreduce.Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

Second question: How hadoop knows that the output of reduce function should be returned to reduce again in next run or write it to HDFS? for example:

public class MyReducer extends Reducer<Text, IntWritable, Text, IntWritable>
    public void reduce(Text key, Iterable<IntWritable> values, Context context) {
        context.write(key, value) /* this key/value will be returned to reduce in next run or will be written to HDFS? */
    } 
}

David Gruzman David Gruzman · Accepted Answer · 2012-12-09T07:38:20

Consider example that input are document name (as key) and document lines (values) and results is STDDEV (standard deviation) of the line length .
To generalize - type of aggregation not have to match type of input data. So Hadoop leave the freedom to developers.
To your second question - Hadoop does not have mechanism similar to MongoDB incremental MapReduce, so results of reducer are always saved to HDFS (or other DFS) and never returned to reduce.

Why reducer has different Input/Output key, value in hadoop map/reduce?

1 Answers