My goal is to run a simple MapReduce job on a Cloudera cluster that reads from a dummy HBase database and writes out in a HDFS file.
Some important notes: - I've successfully run MapReduce jobs that took a HDFS file as input and wrote to HDFS file as output on this cluster before. - I've already replaced the libraries which are used for compiling the project from "purely" HBase to HBase-cloudera jars - When I previously encountered this kind of issues, I used to simply copy a lib into a distributed cache (worked for me with Google Guice): JobConf conf = new JobConf(getConf(), ParseJobConfig.class); DistributedCache.addCacheFile(new URI("/user/hduser/lib/3.0/guice-multibindings-3.0.jar"), conf); but now it doesn't work because the HBaseConfiguration class is used to create a configuration (before the configuration exists) - Cloudera version is 5.3.1, Hadoop version is 2.5.0
This is my driver code:
public class HbaseJobDriver {
public static void main(String[] args) throws Exception {
Configuration conf = HBaseConfiguration.create();
Job job = new Job(conf, "ExampleSummaryToFile");
job.setJarByClass(HbaseJobDriver.class);
Scan scan = new Scan();
scan.setCaching(500);
scan.setCacheBlocks(false);
TableMapReduceUtil.initTableMapperJob("Metrics",
scan,
HbaseJobMapper.class,
Text.class,
IntWritable.class,
job);
job.setReducerClass(HbaseJobReducer.class);
job.setNumReduceTasks(1);
FileOutputFormat.setOutputPath(job, new Path(args[0]));
}
}
I am not sure if mapper/reducer classes are needed to solve this issue.
The exception that I am getting is: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration