What will happen if we skip reducer by keeping mapper and combiner in Mapreduce

Question

My input file that is of size 10 GB is at

/user/cloudera/inputfiles/records.txt

Here is my Driver class code :

public class WordCountMain {

/**
 * @param args
 */
public static void main(String[] args) throws Exception {
    // TODO Auto-generated method stub

    Configuration conf = new Configuration();

    Path inputFilePath = new Path(args[0]);
    Path outputFilePath = new Path(args[1]);




Job job = new Job(conf,"word count");
job.getConfiguration().set("mapred.job.queue.name","omega");

    job.setJarByClass(WordCountMain.class);



    FileInputFormat.addInputPath(job, inputFilePath);
    FileOutputFormat.setOutputPath(job, outputFilePath);

    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);

    job.setMapperClass(WordCountMapper.class);
    job.setCombinerClass(WordCountCombiner.class);
    job.setNumReduceTasks(0);
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(IntWritable.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    System.exit(job.waitForCompletion(true) ? 0 : 1);

}

}

I have code for Mapper and Combiner ,I have set reducer to zero

Here is my Mapper code :

public class WordCountMapper extends Mapper<Object,Text,Text,IntWritable>
{
public static IntWritable one = new IntWritable(1);

    protected void map(Object key, Text value, Context context) throws java.io.IOException,java.lang.InterruptedException
    {

    String line =   value.toString();
    String eachWord =null;
    StringTokenizer st = new StringTokenizer(line,"|");

    while(st.hasMoreTokens())
    {
        eachWord = st.nextToken();
        context.write(new Text(eachWord), one);
    }


    }
}

I have written my Own Combiner

Here is my Combiner Code :

public class WordCountCombiner extends Reducer<Text ,IntWritable,Text,IntWritable> {


protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws java.io.IOException, java.lang.InterruptedException
{
    int count =0;
    for(IntWritable i : values)
    {
        count =count+i.get();
    }
    context.write(key, new IntWritable(count));
}

}

My Question here is What output will it get stored .

The Output of Mapper or output of combiner?

Or Combiner will get executed only if there is reducer phase written?

Pls help

svk svk · Accepted Answer · 2015-03-05T08:14:20

You cannot be sure how many times the combiner function will run or if at all it will run. Also running the combiner is not dependent on if you specify reducer for your job. In your case it will simply produce 160 output files (10240/64=160)

What will happen if we skip reducer by keeping mapper and combiner in Mapreduce

2 Answers