I can read/write data from HBASE by JAVA api provided by HBASE project. But in this way the reading operation will be processed in spark driver program, It does not seem like a clever way. Is there some spark way to read data from HBASE so that reading operation can be finished in different workers to improve performance?
0
votes
1 Answers
0
votes
Is there some spark way to read data from HBASE
Yes
- You can use Apache Phoenix on top of HBase.
- Phoenix provides the SQL like layer on top of HBase.
- It is possible to load a HBase table in Spark by using
SQLContext
. - Just include the hbase-phoenix client jar and spark-csv databricks jar
Spark code to read Hbase table
================================================================
sqlContext.read.format("org.apache.phoenix.spark")
.option("table","HBase_table_name")
.option("zkUrl","Master_node_DNS_name:2181")
.load()
.registerTempTable("tempTblName")