My input file that is of size 10 GB is at
/user/cloudera/inputfiles/records.txt
Here is my Driver class code :
public class WordCountMain {
/**
* @param args
*/
public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
Configuration conf = new Configuration();
Path inputFilePath = new Path(args[0]);
Path outputFilePath = new Path(args[1]);
Job job = new Job(conf,"word count");
job.getConfiguration().set("mapred.job.queue.name","omega");
job.setJarByClass(WordCountMain.class);
FileInputFormat.addInputPath(job, inputFilePath);
FileOutputFormat.setOutputPath(job, outputFilePath);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setMapperClass(WordCountMapper.class);
job.setCombinerClass(WordCountCombiner.class);
job.setNumReduceTasks(0);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
I have code for Mapper and Combiner ,I have set reducer to zero
Here is my Mapper code :
public class WordCountMapper extends Mapper<Object,Text,Text,IntWritable>
{
public static IntWritable one = new IntWritable(1);
protected void map(Object key, Text value, Context context) throws java.io.IOException,java.lang.InterruptedException
{
String line = value.toString();
String eachWord =null;
StringTokenizer st = new StringTokenizer(line,"|");
while(st.hasMoreTokens())
{
eachWord = st.nextToken();
context.write(new Text(eachWord), one);
}
}
}
I have written my Own Combiner
Here is my Combiner Code :
public class WordCountCombiner extends Reducer<Text ,IntWritable,Text,IntWritable> {
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws java.io.IOException, java.lang.InterruptedException
{
int count =0;
for(IntWritable i : values)
{
count =count+i.get();
}
context.write(key, new IntWritable(count));
}
}
My Question here is What output will it get stored .
The Output of Mapper or output of combiner?
Or Combiner will get executed only if there is reducer phase written?
Pls help