I would like to use the apache-phoenix framework. The problem is that I keep having an exception telling me that the class HBaseConfiguration can't be found. Here is the code I want to use:
import org.apache.spark.SparkContext
import org.apache.spark.sql._
import org.apache.phoenix.spark._
// Load INPUT_TABLE
object MainTest2 extends App {
val sc = new SparkContext("local", "phoenix-test")
val sqlContext = new SQLContext(sc)
val df = sqlContext.load("org.apache.phoenix.spark", Map("table" -> "INPUT_TABLE",
"zkUrl" -> "localhost:3888"))
}
Here is the SBT I'm using :
name := "spark-to-hbase"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies ++= Seq(
"org.apache.hadoop" % "hadoop-mapreduce-client-core" % "2.3.0",
"org.apache.phoenix" % "phoenix-core" % "4.11.0-HBase-1.3",
"org.apache.spark" % "spark-core_2.11" % "2.1.1",
"org.apache.spark" % "spark-sql_2.11" % "2.1.1",
"org.apache.phoenix" % "phoenix-spark" % "4.11.0-HBase-1.3"
)
And here is the exception :
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration at org.apache.phoenix.query.ConfigurationFactory$ConfigurationFactoryImpl$1.call(ConfigurationFactory.java:49) at org.apache.phoenix.query.ConfigurationFactory$ConfigurationFactoryImpl$1.call(ConfigurationFactory.java:46) at org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:76) at org.apache.phoenix.util.PhoenixContextExecutor.callWithoutPropagation(PhoenixContextExecutor.java:91) at org.apache.phoenix.query.ConfigurationFactory$ConfigurationFactoryImpl.getConfiguration(ConfigurationFactory.java:46) at org.apache.phoenix.jdbc.PhoenixDriver.initializeConnectionCache(PhoenixDriver.java:151) at org.apache.phoenix.jdbc.PhoenixDriver.(PhoenixDriver.java:142) at org.apache.phoenix.jdbc.PhoenixDriver.(PhoenixDriver.java:69) at org.apache.phoenix.spark.PhoenixRDD.(PhoenixRDD.scala:43) at org.apache.phoenix.spark.PhoenixRelation.schema(PhoenixRelation.scala:52) at org.apache.spark.sql.execution.datasources.LogicalRelation.(LogicalRelation.scala:40) at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:389) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125) at org.apache.spark.sql.SQLContext.load(SQLContext.scala:965) at MainTest2$.delayedEndpoint$MainTest2$1(MainTest2.scala:9) at MainTest2$delayedInit$body.apply(MainTest2.scala:6) at scala.Function0$class.apply$mcV$sp(Function0.scala:34) at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) at scala.App$$anonfun$main$1.apply(App.scala:76) at scala.App$$anonfun$main$1.apply(App.scala:76) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35) at scala.App$class.main(App.scala:76) at MainTest2$.main(MainTest2.scala:6) at MainTest2.main(MainTest2.scala) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 26 more
I've already tried to change the HADOOP_CLASSPATH in hadoop-env.sh like it is said in this previous post.
What can I do to overcome this problem?