I have an hadoop program where I want to chain two jobs such that input -> mapper1 -> reducer1 -> mapper2 -> reducer2 -> output. The first half works fine and I get a correct intermediate output. The problem lies within the second job. In particular, I believe that in the second job the mapper does not call the right reducer for some reason since I get a type mismatch. Here is the code of the main where I set up the jobs:
//JOB 1
Path input1 = new Path(otherArgs.get(0));
Path output1 =new Path("/tempBinaryPath");
Job job1 = Job.getInstance(conf);
job1.setJarByClass(BinaryPathRefined.class);
job1.setJobName("BinaryPathR1");
FileInputFormat.addInputPath(job1, input1);
FileOutputFormat.setOutputPath(job1, output1);
job1.setMapperClass(MyMapper.class);
//job.setCombinerClass(MyReducer.class);
job1.setReducerClass(MyReducer.class);
job1.setInputFormatClass(TextInputFormat.class);
job1.setOutputKeyClass(Text.class);
job1.setOutputValueClass(Text.class);
job1.waitForCompletion(true);
// JOB 2
Path input2 = new Path("/tempBinaryPath/part-r-00000");
Path output2 =new Path(otherArgs.get(1));
Job job2 = Job.getInstance(conf2);
job2.setJarByClass(BinaryPathRefined.class);
job2.setJobName("BinaryPathR2");
FileInputFormat.addInputPath(job2, input2);
FileOutputFormat.setOutputPath(job2, output2);
job2.setMapperClass(MyMapper2.class);
//job.setCombinerClass(MyReducer.class);
job2.setReducerClass(MyReducer2.class);
job2.setInputFormatClass(TextInputFormat.class);
job2.setOutputKeyClass(Text.class);
job2.setOutputValueClass(Text.class);
job2.waitForCompletion(true);
The mappers and the reducers are of the form:
public static class MyMapper extends Mapper<LongWritable, Text, Text, Text>{
...
}
public static class MyReducer extends Reducer<Text, Text, Text, Text>{
...
}
public static class MyMapper2 extends Mapper<LongWritable, Text, Text, IntWritable>{
...
}
public static class MyReducer2 extends Reducer<Text, IntWritable, Text, Text>{
...
}
The first job runs fine, while in the second I get the error:
Type mismatch in value from map: expected org.apache.hadoop.io.Text, received org.apache.hadoop.io.IntWritable
Any ideas?