0
votes

My goal is to run a simple MapReduce job on a Cloudera cluster that reads from a dummy HBase database and writes out in a HDFS file.

Some important notes: - I've successfully run MapReduce jobs that took a HDFS file as input and wrote to HDFS file as output on this cluster before. - I've already replaced the libraries which are used for compiling the project from "purely" HBase to HBase-cloudera jars - When I previously encountered this kind of issues, I used to simply copy a lib into a distributed cache (worked for me with Google Guice): JobConf conf = new JobConf(getConf(), ParseJobConfig.class); DistributedCache.addCacheFile(new URI("/user/hduser/lib/3.0/guice-multibindings-3.0.jar"), conf); but now it doesn't work because the HBaseConfiguration class is used to create a configuration (before the configuration exists) - Cloudera version is 5.3.1, Hadoop version is 2.5.0

This is my driver code:

public class HbaseJobDriver {
    public static void main(String[] args) throws Exception {
    Configuration conf = HBaseConfiguration.create();
    Job job = new Job(conf, "ExampleSummaryToFile");
    job.setJarByClass(HbaseJobDriver.class);

    Scan scan = new Scan();
    scan.setCaching(500); 
    scan.setCacheBlocks(false);

    TableMapReduceUtil.initTableMapperJob("Metrics", 
    scan, 
    HbaseJobMapper.class,
    Text.class,
    IntWritable.class, 
    job);

    job.setReducerClass(HbaseJobReducer.class); 
    job.setNumReduceTasks(1);
    FileOutputFormat.setOutputPath(job, new Path(args[0]));
  }
}

I am not sure if mapper/reducer classes are needed to solve this issue.

The exception that I am getting is: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration

3

3 Answers

1
votes

We've just solved it together with my colleague, in our case we needed to update .bashrc file:

  1. nano ~/.bashrc
  2. Put the libraries to the classpath like this:

HBASE_PATH=/opt/cloudera/parcels/CDH/jars

export HADOOP_CLASSPATH=${HBASE_PATH}/hbase-common-0.98.6-cdh5.3.1.jar:<ANY_OTHER_JARS_REQUIRED>

  1. Don't forget to reload bashrc:

. .bashrc

0
votes

Try this.

export HADOOP_CLASSPATH="/usr/lib/hbase/hbase.jar:$HADOOP_CLASSPATH"

add the above property in your /etc/hadoop/conf/hadoop-env.sh file or set it from command line

0
votes

The error Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration is due to lack of HBase jar.

If what @sravan said did not work, then try to import HBaseConfiguration in your driver code (import section) like this:

import org.apache.hadoop.hbase.HBaseConfiguration;