I'am trying to run a simple program that copies the content of an rdd into a Hbase table. I'am using spark-hbase-connector by nerdammer https://github.com/nerdammer/spark-hbase-connector. I'am running the code using spark-submit on a local cluster on my machine. Spark version is 2.1. this is the code i'am trying tu run :

    import org.apache.spark.{SparkConf, SparkContext}
    import it.nerdammer.spark.hbase._

    object HbaseConnect {

    def main(args: Array[String]) {
 val sparkConf = new SparkConf()

 sparkConf.set("spark.hbase.host", "hostname")
 sparkConf.set("zookeeper.znode.parent", "/hbase-unsecure")

 val sc = new SparkContext(sparkConf)

   val rdd = sc.parallelize(1 to 100)
  .map(i => (i.toString, i+1, "Hello"))

  rdd.toHBaseTable("mytable").toColumns("column1", "column2")


Here is my build.sbt:

    name := "HbaseConnect"
    version := "0.1"
    scalaVersion := "2.11.8"

    assemblyMergeStrategy in assembly := {
    case PathList("META-INF", xs @ _*) => MergeStrategy.discard
    case x => MergeStrategy.first}

    libraryDependencies ++= Seq(
   "org.apache.spark" %% "spark-core" % "2.1.0" % "provided",
   "it.nerdammer.bigdata" % "spark-hbase-connector_2.10" % "1.0.3")

the execution gets stuck showing the following info:

   17/11/22 10:20:34 INFO ZooKeeperRegistry: ClusterId read in ZooKeeper is null
   17/11/22 10:20:34 INFO TableOutputFormat: Created table instance for mytable

I am unable to indentify the problem with zookeeper. The HBase clients will discover the running HBase cluster using the following two properties:

1.hbase.zookeeper.quorum: is used to connect to the zookeeper cluster

2.zookeeper.znode.parent. tells which znode keeps the data (and address for HMaster) for the cluster

I overridden these two properties in the code. with

       sparkConf.set("spark.hbase.host", "hostname")
       sparkConf.set("zookeeper.znode.parent", "/hbase-unsecure")

Another question is that there is no spark-hbase-connector_2.11. can the provided version spark-hbase-connector_2.10 support scala 2.11 ?


1 Answers


Problem is solved. I had to override the Hmaster port to 16000 (wich is my Hmaster port number. I'am using ambari). Default value that sparkConf uses is 60000.

 sparkConf.set("hbase.master", "hostname"+":16000").