I am using the new MapReduce Api on our Yarn cluster. I need to read files of two different formats from two different directories. To do so, I decided to use MultipleInputs to specify the two mapper classes. Following is my Job Driver
Job job = new Job(new Configuration(), "Daily Report");
job.setJarByClass(MyDailyJob.class);
MultipleInputs.addInputPath(job, new Path(args[0]), TextInputFormat.class, Record1ParsingMapper.class);
MultipleInputs.addInputPath(job, new Path(args[1]), TextInputFormat.class, Record2ParsingMapper.class);
FileOutputFormat.setOutputPath(job, new Path(args[2]));
job.setReducerClass(ReportParsingReducer.class);
job.setNumReduceTasks(10);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(NullWritable.class);
return (job.waitForCompletion(true) ? 0 : 1);
And My Mappers have the following definition: public class Record1ParsingMapper extends Mapper
When I run this job, I get the following exception:
Exception in thread "main" java.lang.NoClassDefFoundError: org/json/simple/parser/ParseException
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1986)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1951)
at org.apache.hadoop.mapreduce.lib.input.MultipleInputs.getMapperTypeMap(MultipleInputs.java:141)
at org.apache.hadoop.mapreduce.lib.input.DelegatingInputFormat.getSplits(DelegatingInputFormat.java:60)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:498)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:515)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:399)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1313)
at main.java.com.adnear.mr.jobs.MyDailyJob.run(MyDailyJob.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at main.java.com.adnear.mr.jobs.MyDailyJob.main(MyDailyJob.java:226)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.lang.ClassNotFoundException: org.json.simple.parser.ParseException\
It looks like it is failing inside the getClassByName() method in the Configuration class at the following statement. clazz = Class.forName(name, true, classLoader);
I am specifying the path of my Mapper classes correctly. Can someone please explain me what is causing this class loading exception?
Thanks, Dev
Record1ParsingMapperis an inner class? - Aleksei Shestakov