I am implementing a chain of MR-jobs in Hadoop 2.2.0 using ControlledJobs. The basic schema is this:
mapper1 -> reducer1 -> mapper2 -> reducer2
But, mapper2 is the identity. Is there a way to easily let reducer1 generate key-value-pairs and pass them to reducer2?
Right now, the job output for both rounds is configured as follows:
// set intermediate/mapper output
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
// set reducer output
job.setOutputKeyClass(NullWritable.class);
job.setOutputValueClass(Text.class);