how to set classpath for a Java program on hadoop file system

Question

I am trying to figure out how to set class path that reference to HDFS? I cannot find any reference.

 java -cp "how to reference to HDFS?" com.MyProgram

If i cannot reference to hadoop file system, then i have to copy all the referenced third party libs/jars somewhere under $HADOOP_HOME on each hadoop machine...but i wanna avoid this by putting files to hadoop file system. Is this possible?

Example hadoop command line for the program to run (my expectation is like this, maybe i am wrong):

hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-streaming-1.0.3.jar -input inputfileDir -output outputfileDir -mapper /home/nanshi/myprog.java -reducer NONE -file /home/nanshi/myprog.java

However, within the command line above, how do i added java classpath? like -cp "/home/nanshi/wiki/Lucene/lib/lucene-core-3.6.0.jar:/home/nanshi/Lucene/bin"

Animesh Raj Jha Animesh Raj Jha · Accepted Answer · 2012-07-28T05:29:47

What I suppose you are trying to do is include third party libraries in your distributed program. There are many options you can do.

Option 1) Easiest option that I find is to put all the jars in $HADOOP_HOME/lib (eg /usr/local/hadoop-0.22.0/lib) directory on all nodes and restart your jobtracker and tasktracker.

Option 2) Use libjars option command for this is hadoop jar -libjars comma_seperated_jars

Option 3) Include the jars in lib directory of the jar. You will have to do that while creating your jar.

Option 4) Install all the jars in your computer and include their location in class path.

Option 5) You can try by putting those jars in distributed cache.

how to set classpath for a Java program on hadoop file system

3 Answers