2
votes

Because of nature of Map/Reduce applications, reduce function may be called more than once, so the Input/Output key value must be same like Map/Reduce implementation of MongoDB. I Wonder why in Hadoop implementation it is different?(I'd better say it is allowed to be different)

org.apache.hadoop.mapreduce.Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

Second question: How hadoop knows that the output of reduce function should be returned to reduce again in next run or write it to HDFS? for example:

public class MyReducer extends Reducer<Text, IntWritable, Text, IntWritable>
    public void reduce(Text key, Iterable<IntWritable> values, Context context) {
        context.write(key, value) /* this key/value will be returned to reduce in next run or will be written to HDFS? */
    } 
}
1

1 Answers

2
votes

Consider example that input are document name (as key) and document lines (values) and results is STDDEV (standard deviation) of the line length .
To generalize - type of aggregation not have to match type of input data. So Hadoop leave the freedom to developers.
To your second question - Hadoop does not have mechanism similar to MongoDB incremental MapReduce, so results of reducer are always saved to HDFS (or other DFS) and never returned to reduce.