1
votes

I have an hadoop program where I want to chain two jobs such that input -> mapper1 -> reducer1 -> mapper2 -> reducer2 -> output. The first half works fine and I get a correct intermediate output. The problem lies within the second job. In particular, I believe that in the second job the mapper does not call the right reducer for some reason since I get a type mismatch. Here is the code of the main where I set up the jobs:

    //JOB 1
    Path input1 = new Path(otherArgs.get(0));
    Path output1 =new Path("/tempBinaryPath");

    Job job1 = Job.getInstance(conf);
        job1.setJarByClass(BinaryPathRefined.class);
        job1.setJobName("BinaryPathR1");

    FileInputFormat.addInputPath(job1, input1);
    FileOutputFormat.setOutputPath(job1, output1);

    job1.setMapperClass(MyMapper.class);
    //job.setCombinerClass(MyReducer.class);
    job1.setReducerClass(MyReducer.class);

    job1.setInputFormatClass(TextInputFormat.class);

    job1.setOutputKeyClass(Text.class);
    job1.setOutputValueClass(Text.class);

    job1.waitForCompletion(true);


    // JOB 2
    Path input2 = new Path("/tempBinaryPath/part-r-00000");
    Path output2 =new Path(otherArgs.get(1));

    Job job2 = Job.getInstance(conf2);
        job2.setJarByClass(BinaryPathRefined.class);
        job2.setJobName("BinaryPathR2");

    FileInputFormat.addInputPath(job2, input2);
    FileOutputFormat.setOutputPath(job2, output2);

    job2.setMapperClass(MyMapper2.class);
    //job.setCombinerClass(MyReducer.class);
    job2.setReducerClass(MyReducer2.class);

    job2.setInputFormatClass(TextInputFormat.class);

    job2.setOutputKeyClass(Text.class);
    job2.setOutputValueClass(Text.class);

    job2.waitForCompletion(true);

The mappers and the reducers are of the form:

public static class MyMapper extends Mapper<LongWritable, Text, Text, Text>{
...
}

public static class MyReducer extends Reducer<Text, Text, Text, Text>{
...
}

public static class MyMapper2 extends Mapper<LongWritable, Text, Text, IntWritable>{
...
}

public static class MyReducer2 extends Reducer<Text, IntWritable, Text, Text>{
...
}

The first job runs fine, while in the second I get the error:

Type mismatch in value from map: expected org.apache.hadoop.io.Text, received org.apache.hadoop.io.IntWritable

Any ideas?

1

1 Answers

3
votes

When you call only setOutputKeyClass and setOutputValueClass Hadoop will assume that both Mapper and Reducer has the same output types. In your case, you should explicitely set what are the types of the output the Mapper produces:

job2.setOutputKeyClass(Text.class);
job2.setMapOutputValueClass(IntWritable.class);