1
votes

I am implementing a chain of MR-jobs in Hadoop 2.2.0 using ControlledJobs. The basic schema is this:

mapper1 -> reducer1 -> mapper2 -> reducer2

But, mapper2 is the identity. Is there a way to easily let reducer1 generate key-value-pairs and pass them to reducer2?

Right now, the job output for both rounds is configured as follows:

// set intermediate/mapper output
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);

// set reducer output
job.setOutputKeyClass(NullWritable.class);
job.setOutputValueClass(Text.class);
1
If you did nt give mapper2 also , an identity mapper will be executed by default - USB

1 Answers

0
votes

To my knowledge, identity mapper is still a mapper and you can't bypass it. However, I believe in some cases you can refactor mapper1, reducer1 and reducer2 into one single job so that it will become: mapper1 -> reducer1. That totally depends on your use case and what data you trying to reduce (twice) here.