0
votes

I have Spark application that runs successfully on my local machine. I use Hbase Docker container, from which I load the data to my Spark app. Now I have created EMR cluster with Spark and Hbase installed. Buy when I'm trying to submit my JAR file I get the following exception:

java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration

When running my app locally, I was able to avoid this kind of error by adding --jars flag to spark-submit, giving Spark the path to all Hbase Jars.

  1. How could I overcome this error when running on EMR?

  2. Should I re-direct Spark to Hbase jar's in EMR as well? Where those jars located on EMR cluster?

        Configuration hBaseConf = HBaseConfiguration.create();
        hBaseConf.set(TableInputFormat.INPUT_TABLE, "MyTable");
    
    JavaRDD<String> myStrings = sparkContext.newAPIHadoopRDD(
        hBaseConf, TableInputFormat.class,ImmutableBytesWritable.class, Result.class).keys().map(key -> {
            String from = Bytes.toString(key.get());
            return from;
        });
      .
      .
      .
    
1

1 Answers

0
votes

I was able to locate the JAR's on EMR shell with hbase classpath command. Then I took Hbase path to jars and added to spark-submit with --jars flag.