0
votes

My input file that is of size 10 GB is at

/user/cloudera/inputfiles/records.txt

Here is my Driver class code :

public class WordCountMain {

/**
 * @param args
 */
public static void main(String[] args) throws Exception {
    // TODO Auto-generated method stub

    Configuration conf = new Configuration();

    Path inputFilePath = new Path(args[0]);
    Path outputFilePath = new Path(args[1]);




Job job = new Job(conf,"word count");
job.getConfiguration().set("mapred.job.queue.name","omega");

    job.setJarByClass(WordCountMain.class);



    FileInputFormat.addInputPath(job, inputFilePath);
    FileOutputFormat.setOutputPath(job, outputFilePath);

    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);

    job.setMapperClass(WordCountMapper.class);
    job.setCombinerClass(WordCountCombiner.class);
    job.setNumReduceTasks(0);
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(IntWritable.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    System.exit(job.waitForCompletion(true) ? 0 : 1);

}

}

I have code for Mapper and Combiner ,I have set reducer to zero

Here is my Mapper code :

public class WordCountMapper extends Mapper<Object,Text,Text,IntWritable>
{
public static IntWritable one = new IntWritable(1);

    protected void map(Object key, Text value, Context context) throws java.io.IOException,java.lang.InterruptedException
    {

    String line =   value.toString();
    String eachWord =null;
    StringTokenizer st = new StringTokenizer(line,"|");

    while(st.hasMoreTokens())
    {
        eachWord = st.nextToken();
        context.write(new Text(eachWord), one);
    }


    }
}

I have written my Own Combiner

Here is my Combiner Code :

public class WordCountCombiner extends Reducer<Text ,IntWritable,Text,IntWritable> {


protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws java.io.IOException, java.lang.InterruptedException
{
    int count =0;
    for(IntWritable i : values)
    {
        count =count+i.get();
    }
    context.write(key, new IntWritable(count));
}

}

My Question here is What output will it get stored .

The Output of Mapper or output of combiner?

Or Combiner will get executed only if there is reducer phase written?

Pls help

2

2 Answers

0
votes

You cannot be sure how many times the combiner function will run or if at all it will run. Also running the combiner is not dependent on if you specify reducer for your job. In your case it will simply produce 160 output files (10240/64=160)

0
votes

By skipping the setting of mapper and reducer, the hadoop will move forward with its default mapping. For example, it will use

  1. IdentityMapper.class as a default mapper.

  2. The default input format is TextInputFormat.

  3. The default partitioner is HashPartitione.

  4. By default, there is a single reducer, and therefore a single partition.

  5. The default reducer is Reducer, again a generic type.

  6. The default output format is TextOutputFormat, which writes out records, one per line, by converting keys and values to strings and separating them with a tab character