I'am trying to run a simple program that copies the content of an rdd into a Hbase table. I'am using spark-hbase-connector by nerdammer https://github.com/nerdammer/spark-hbase-connector. I'am running the code using spark-submit on a local cluster on my machine. Spark version is 2.1. this is the code i'am trying tu run :
import org.apache.spark.{SparkConf, SparkContext}
import it.nerdammer.spark.hbase._
object HbaseConnect {
def main(args: Array[String]) {
val sparkConf = new SparkConf()
sparkConf.set("spark.hbase.host", "hostname")
sparkConf.set("zookeeper.znode.parent", "/hbase-unsecure")
val sc = new SparkContext(sparkConf)
val rdd = sc.parallelize(1 to 100)
.map(i => (i.toString, i+1, "Hello"))
rdd.toHBaseTable("mytable").toColumns("column1", "column2")
.inColumnFamily("mycf")
.save()
sc.stop
}}
Here is my build.sbt:
name := "HbaseConnect"
version := "0.1"
scalaVersion := "2.11.8"
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs @ _*) => MergeStrategy.discard
case x => MergeStrategy.first}
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.1.0" % "provided",
"it.nerdammer.bigdata" % "spark-hbase-connector_2.10" % "1.0.3")
the execution gets stuck showing the following info:
17/11/22 10:20:34 INFO ZooKeeperRegistry: ClusterId read in ZooKeeper is null
17/11/22 10:20:34 INFO TableOutputFormat: Created table instance for mytable
I am unable to indentify the problem with zookeeper. The HBase clients will discover the running HBase cluster using the following two properties:
1.hbase.zookeeper.quorum: is used to connect to the zookeeper cluster
2.zookeeper.znode.parent. tells which znode keeps the data (and address for HMaster) for the cluster
I overridden these two properties in the code. with
sparkConf.set("spark.hbase.host", "hostname")
sparkConf.set("zookeeper.znode.parent", "/hbase-unsecure")
Another question is that there is no spark-hbase-connector_2.11. can the provided version spark-hbase-connector_2.10 support scala 2.11 ?