I have wriiten a simple hadoop job. Now I want to run it without creating the jar file as opposed to lots of tutorials found on net.
I am calling it from a shell script on ubuntu platform which runs a cloudera CHD4 distribution of hadoop(2.0.0+91).
I can't create the jar file of the job because it depends on several other third party jars and configuration files which are already centrally deployed over my machine and are not accessible at the time of creating the jar. Hence I am looking out for a way where I can include these custom jar files and configuration files.
I also can't use -libjars and DistributedCache options because they only affect map/reduce phase but my driver class also is using these jar and configuration files. My job uses several in house utility code which internally uses these third party libraries and configuration files which I have only access to read from a centrally deployed location.
Here is how I am calling it from a shell script.
sudo -u hdfs hadoop x.y.z.MyJob /input /output
It shows me a
Caused by: java.lang.ClassNotFoundException: x.y.z.MyJob
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
My calling shell script successfully sets the hadoop classpath and contains all my required third party libraries and configuration files from a centrally deployed location.
I am sure that my class x.y.z.MyJob and all required libraries and configuration files are found in both the $CLASSPATH and $HADOOP_CLASSPATH environment varibales which I am setting before calling the hadoop job
Why at the time of running the script my program is not able to find the class. Can't I run the job as a normal java class? All my other normal java programs are using the same classpath and they can always find the classes and configuration files without any problem.
Please let me know how can I access centrally deployed haddop job code and execute it.
EDIT: Here is my code to set the classpath
CLASSES_DIR=$BASE_DIR/classes/current
BIN_DIR=$BASE_DIR/bin/current
LIB_DIR=$BASE_DIR/lib/current
CONFIG_DIR=$BASE_DIR/config/current
DATA_DIR=$BASE_DIR/data/current
CLASSPATH=./
CLASSPATH=$CLASSPATH:$CLASSES_DIR
CLASSPATH=$CLASSPATH:$BIN_DIR
CLASSPATH=$CLASSPATH:$CONFIG_DIR
CLASSPATH=$CLASSPATH:$DATA_DIR
LIBPATH=`$BIN_DIR/lib.sh $LIB_DIR`
CLASSPATH=$CLASSPATH:$LIBPATH
export HADOOP_CLASSPATH=$CLASSPATH
lib.sh is the file to concatenate all third party files to a : separated format and CLASSES_DIR contains my job code x.y.z.MyJob class. All my configuration files are unders CONFIG_DIR
When I print my CLASSPATH and HADOOP_CLASSPATH it shows me correct values. However whenever I call hadoop classpath just before executing the job, it shows me following output.
$ hadoop classpath
/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:myname:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/*:/usr/lib/hadoop-0.20-mapreduce/.//*
$
It obviously does not have any of those previously set $CLASSPATH and $HADOOP_CLASSPATH varibales appended. Where are these environment varibales.