Running a Hadoop Job From another Java Program

Question

I am writing a program that receives the source code of the mapper/reducers, dynamically compiles the mappers/reducers and makes a JAR file out of them. It then has to run this JAR file on a hadoop cluster.

For the last part, I setup all the required parameters dynamically through my code. However, the problem I am facing now is that the code requires the compiled mapper and reducer classes at the time of compiling. But at the time of compiling, I do not have these classes and they will later be received during the run time (e.g. through a message received from a remote node). I would appreciate any idea/suggestion on how to pass this problem?

Here's below you can find the code for my last part with the problem being at job.setMapperClass(Mapper_Class.class) and job.setReducerClass(Reducer_Class.class) requiring the classes (Mapper_Class.class and Reducer_Class.class) files to be present at the time of compiling:

    private boolean run_Hadoop_Job(String className){
try{
    System.out.println("Starting to run the code on Hadoop...");
    String[] argsTemp = { "project_test/input", "project_test/output" };
    // create a configuration
    Configuration conf = new Configuration();
    conf.set("fs.default.name", "hdfs://localhost:54310");
    conf.set("mapred.job.tracker", "localhost:54311");
    conf.set("mapred.jar", jar_Output_Folder+ java.io.File.separator 
                            + className+".jar");
    conf.set("mapreduce.map.class", "Mapper_Reducer_Classes$Mapper_Class.class");
    conf.set("mapreduce.reduce.class", "Mapper_Reducer_Classes$Reducer_Class.class");
    // create a new job based on the configuration
    Job job = new Job(conf, "Hadoop Example for dynamically and programmatically compiling-running a job");
    job.setJarByClass(Platform.class);
    //job.setMapperClass(Mapper_Class.class);
    //job.setReducerClass(Reducer_Class.class);

    // key/value of your reducer output
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    FileInputFormat.addInputPath(job, new Path(argsTemp[0]));
    // this deletes possible output paths to prevent job failures
    FileSystem fs = FileSystem.get(conf);
    Path out = new Path(argsTemp[1]);
    fs.delete(out, true);
    // finally set the empty out path
    FileOutputFormat.setOutputPath(job, new Path(argsTemp[1]));

    //job.submit();
    System.exit(job.waitForCompletion(true) ? 0 : 1); 
    System.out.println("Job Finished!");        
} catch (Exception e) { return false; }
return true;
}

Revised: So I revised the code to specify the mapper and reducers using conf.set("mapreduce.map.class, "my mapper.class"). Now the code compiles correctly but when it is executed it throws the following error:

ec 24, 2012 6:49:43 AM org.apache.hadoop.mapred.JobClient monitorAndPrintJob INFO: Task Id : attempt_201212240511_0006_m_000001_2, Status : FAILED java.lang.RuntimeException: java.lang.ClassNotFoundException: Mapper_Reducer_Classes$Mapper_Class.class at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809) at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:157) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:569) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170)

Thomas Jungblut Thomas Jungblut · Accepted Answer · 2012-12-23T13:41:24

If you don't have them at compile time, then directly set the name in the configuration like this:

conf.set("mapreduce.map.class", "org.what.ever.ClassName");
conf.set("mapreduce.reduce.class", "org.what.ever.ClassName");

Running a Hadoop Job From another Java Program

3 Answers