Hadoop Mapreduce MultipleInputs cannot load the mapper classes

Question

I am using the new MapReduce Api on our Yarn cluster. I need to read files of two different formats from two different directories. To do so, I decided to use MultipleInputs to specify the two mapper classes. Following is my Job Driver

Job job = new Job(new Configuration(), "Daily Report");

job.setJarByClass(MyDailyJob.class);

MultipleInputs.addInputPath(job, new Path(args[0]), TextInputFormat.class, Record1ParsingMapper.class);
MultipleInputs.addInputPath(job, new Path(args[1]), TextInputFormat.class, Record2ParsingMapper.class);


FileOutputFormat.setOutputPath(job, new Path(args[2]));

job.setReducerClass(ReportParsingReducer.class);
job.setNumReduceTasks(10);

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(NullWritable.class);

return (job.waitForCompletion(true) ? 0 : 1);

And My Mappers have the following definition: public class Record1ParsingMapper extends Mapper

When I run this job, I get the following exception:

Exception in thread "main" java.lang.NoClassDefFoundError: org/json/simple/parser/ParseException
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:270)
    at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1986)
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1951)
    at org.apache.hadoop.mapreduce.lib.input.MultipleInputs.getMapperTypeMap(MultipleInputs.java:141)
    at org.apache.hadoop.mapreduce.lib.input.DelegatingInputFormat.getSplits(DelegatingInputFormat.java:60)
    at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:498)
    at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:515)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:399)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1313)
    at main.java.com.adnear.mr.jobs.MyDailyJob.run(MyDailyJob.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at main.java.com.adnear.mr.jobs.MyDailyJob.main(MyDailyJob.java:226)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

Caused by: java.lang.ClassNotFoundException: org.json.simple.parser.ParseException\

It looks like it is failing inside the getClassByName() method in the Configuration class at the following statement. clazz = Class.forName(name, true, classLoader);

I am specifying the path of my Mapper classes correctly. Can someone please explain me what is causing this class loading exception?

Thanks, Dev

could you close the question if you find the answer ? Please post your solution if the question is solved. — ggorantl

Mr.Chowdary Mr.Chowdary · Accepted Answer · 2014-12-01T14:08:30

Caused by: java.lang.ClassNotFoundException: org.json.simple.parser.ParseException

The error message clearly states that some class is missing in the classpath. This class is the part of json-simple-1.1.1.jar jar file needed at runtime to execute the Job .So add this jar file in classpath to run Job successfully. You can get the jar file from HERE.
Hope it helps!

Hadoop Mapreduce MultipleInputs cannot load the mapper classes

1 Answers