I have Spark application that runs successfully on my local machine. I use Hbase Docker container, from which I load the data to my Spark app. Now I have created EMR cluster with Spark and Hbase installed. Buy when I'm trying to submit my JAR file I get the following exception:
java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
When running my app locally, I was able to avoid this kind of error by adding --jars flag to spark-submit, giving Spark the path to all Hbase Jars.
How could I overcome this error when running on EMR?
Should I re-direct Spark to Hbase jar's in EMR as well? Where those jars located on EMR cluster?
Configuration hBaseConf = HBaseConfiguration.create(); hBaseConf.set(TableInputFormat.INPUT_TABLE, "MyTable"); JavaRDD<String> myStrings = sparkContext.newAPIHadoopRDD( hBaseConf, TableInputFormat.class,ImmutableBytesWritable.class, Result.class).keys().map(key -> { String from = Bytes.toString(key.get()); return from; }); . . .