0
votes

I can read/write data from HBASE by JAVA api provided by HBASE project. But in this way the reading operation will be processed in spark driver program, It does not seem like a clever way. Is there some spark way to read data from HBASE so that reading operation can be finished in different workers to improve performance?

1

1 Answers

0
votes
Is there some spark way to read data from HBASE

Yes

  • You can use Apache Phoenix on top of HBase.
  • Phoenix provides the SQL like layer on top of HBase.
  • It is possible to load a HBase table in Spark by using SQLContext.
  • Just include the hbase-phoenix client jar and spark-csv databricks jar

Spark code to read Hbase table

================================================================

sqlContext.read.format("org.apache.phoenix.spark")
.option("table","HBase_table_name")
.option("zkUrl","Master_node_DNS_name:2181")
.load()
.registerTempTable("tempTblName")