Running Hadoop Job Remotely from a Java Client

Question

I have VirtualBox VM running HBase and Hadoop in a pseudodistributed mode. I have modified some simple MapReduce code to count the number of rows in a given HBase table (the Hbase MapReduce RowCounter code). When I compile the modified code into a jar file, transfer it to the VM, and run it normally via the hadoop command line, everything is great. However, what I want to be able to do is to run it from my Java client on my Windows machine (from the Java code, not via an ssh command to execute hadoop command lines – i.e., hadoop jar ). When I try to run it from the Windows side (Java client), all the necessary connections are made into Hadoop and HBase on the VM, but I receive a “classnotfoundexception” that Hadoop cannot find my Mapper class.

I have manually copied the jar file onto HDFS and tried to point the Java client to the location via setting the configuration option (conf.set("mapred.jar", "hdfs:///RowCountTest.jar");). However, it still cannot locate the class (don’t know if it is even looking for the jar).

First, do you know what needs to be done in order for Hadoop to recognize the class file in a jar stored HDFS when running a job from a remote client?

Second, do you know if there is any way to “pass” the necessary class files along with the job to the cluster without pre-loading the jar file?

Nanda Nanda · Accepted Answer · 2013-09-10T13:31:50

You have to copy the jar file to a location in LocalFileSystem (not into HDFS) and set HADOOP_CLASSPATH variable in hadoop-env.sh file to point the same.

After changing hadoop-env.sh file mapreduce services has to be restarted. - JobTracker - TaskTracker

Note: MapReduce job will look for the classes (jars) in the locations specified in HADOOP_CLASSPATH variable.

Running Hadoop Job Remotely from a Java Client

1 Answers