I am trying to write a job which can analyse some information from youtube data set.I believe i have correctly set the output keys from the map in the driver class,but still i am getting the above error i am posting the code and the exception here,
The Mapper
public class YouTubeDataMapper extends Mapper<LongWritable,Text,Text,IntWritable>{
private static final IntWritable one = new IntWritable(1);
private Text category = new Text();
public void mapper(LongWritable key,Text value,Context context) throws IOException, InterruptedException{
String str[] = value.toString().split("\t");
category.set(str[3]);
context.write(category, one);
}
}
The Reducer class
public class YouTubeDataReducer extends Reducer<Text,IntWritable,Text,IntWritable>{
public void reduce(Text key,Iterable<IntWritable> values,Context context) throws IOException, InterruptedException{
int sum=0;
for(IntWritable count:values){
sum+=count.get();
}
context.write(key, new IntWritable(sum));
}
}
The Driver Class
public class YouTubeDataDriver {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
@SuppressWarnings("deprecation")
Job job = new Job(conf, "categories");
job.setJarByClass(YouTubeDataDriver.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
// job.setNumReduceTasks(0);
job.setOutputKeyClass(Text.class);// Here i have set the output keys
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(YouTubeDataMapper.class);
job.setReducerClass(YouTubeDataReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
Path out = new Path(args[1]);
out.getFileSystem(conf).delete(out);
job.waitForCompletion(true);
}
}
The exception i got
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, received org.apache.hadoop.io.LongWritable at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1069) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:712) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
i have set the ouput keys in the driver class
job.setOutputKeyClass(Text.class);// Here i have set the output keys
job.setOutputValueClass(IntWritable.class);
But why i am still getting the error? PLease help, i am new to mapreduce