Running hadoop job without creating a jar file

Question

I have wriiten a simple hadoop job. Now I want to run it without creating the jar file as opposed to lots of tutorials found on net.

I am calling it from a shell script on ubuntu platform which runs a cloudera CHD4 distribution of hadoop(2.0.0+91).

I can't create the jar file of the job because it depends on several other third party jars and configuration files which are already centrally deployed over my machine and are not accessible at the time of creating the jar. Hence I am looking out for a way where I can include these custom jar files and configuration files.

I also can't use -libjars and DistributedCache options because they only affect map/reduce phase but my driver class also is using these jar and configuration files. My job uses several in house utility code which internally uses these third party libraries and configuration files which I have only access to read from a centrally deployed location.

Here is how I am calling it from a shell script.

sudo -u hdfs hadoop x.y.z.MyJob /input /output

It shows me a

Caused by: java.lang.ClassNotFoundException: x.y.z.MyJob
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)

My calling shell script successfully sets the hadoop classpath and contains all my required third party libraries and configuration files from a centrally deployed location.

I am sure that my class x.y.z.MyJob and all required libraries and configuration files are found in both the $CLASSPATH and $HADOOP_CLASSPATH environment varibales which I am setting before calling the hadoop job

Why at the time of running the script my program is not able to find the class. Can't I run the job as a normal java class? All my other normal java programs are using the same classpath and they can always find the classes and configuration files without any problem.

Please let me know how can I access centrally deployed haddop job code and execute it.

EDIT: Here is my code to set the classpath

CLASSES_DIR=$BASE_DIR/classes/current
BIN_DIR=$BASE_DIR/bin/current
LIB_DIR=$BASE_DIR/lib/current
CONFIG_DIR=$BASE_DIR/config/current
DATA_DIR=$BASE_DIR/data/current
CLASSPATH=./
CLASSPATH=$CLASSPATH:$CLASSES_DIR
CLASSPATH=$CLASSPATH:$BIN_DIR
CLASSPATH=$CLASSPATH:$CONFIG_DIR
CLASSPATH=$CLASSPATH:$DATA_DIR
LIBPATH=`$BIN_DIR/lib.sh $LIB_DIR`
CLASSPATH=$CLASSPATH:$LIBPATH
export HADOOP_CLASSPATH=$CLASSPATH

lib.sh is the file to concatenate all third party files to a : separated format and CLASSES_DIR contains my job code x.y.z.MyJob class. All my configuration files are unders CONFIG_DIR

When I print my CLASSPATH and HADOOP_CLASSPATH it shows me correct values. However whenever I call hadoop classpath just before executing the job, it shows me following output.

$ hadoop classpath

/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:myname:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/*:/usr/lib/hadoop-0.20-mapreduce/.//*

$

It obviously does not have any of those previously set $CLASSPATH and $HADOOP_CLASSPATH varibales appended. Where are these environment varibales.

Can you add the relevant part of the classpath , the part where your class is mentioned — Razvan
@Razvan Here is the code to set CLASSPATH and HADOOP_CLASSPATH — ahsan_cse2004
You can't format the comments, but you can edit your question or add a link to a pastebin :) — VoronoiPotato

ahsan_cse2004 ahsan_cse2004 · Accepted Answer · 2012-08-11T10:24:49

Inside my shell script I was running the hadoop jar command with Cloudera's hdfs user

sudo -u hdfs hadoop jar x.y.z.MyJob /input /output

This code was actually being called from the script with a regular ubuntu user which was setting the CLASSPATH and HADOOP_CLASSPATH varibles as mentioned above. And at the time of execution the hadoop jar command was not called using the same regular ubuntu user. Hence there was an exception indicating that the class was not found.

So you have to run the job with the same user who is actually setting the CLASSPATH and HADOOP_CLASSPATH environment variables.

Thanks all for your time.

Running hadoop job without creating a jar file

1 Answers