I've setup an AWS EMR cluster that includes spark 2.3.2, hive 2.3.3, and hbase 1.4.7. How can I configure spark to access hive tables?
I've taken the following steps, but the result is the error message:
java.lang.ClassNotFoundException: java.lang.NoClassDefFoundError: org/apache/tez/dag/api/SessionNotRunning when creating Hive client using classpath:
Please make sure that jars for your version of hive and hadoop are included in the paths passed to spark.sql.hive.metastore.jars
Steps:
cp /usr/lib/hive/conf/hive-site.xml /usr/lib/spark/conf
In
/usr/lib/spark/conf/spark-defaults.conf
added:spark.sql.hive.metastore.jars /usr/lib/hadoop/lib/*:/usr/lib/hive/lib/*
In zeppelin I create a spark session:
val spark = SparkSession.builder.appName("clue").enableHiveSupport().getOrCreate() import spark.implicits._