0
votes

I am trying to identify a solution to read data from HBASE table using spark streaming and write the data to another HBASE table.

I found numerous samples in internet which asks to create a DSTREAM to get the data from HDFS files and all.But I was unable to find any examples to get data from HBASE tables

For e.g, if I have a HBASE table 'SAMPLE' with columns as 'name' and 'activeStatus'. How can I retrieve the data from the table SAMPLE based on activeStatus column using spark streaming (New data?

Any examples to retrieve the data from HBASE table using spark streaming is welcome.

Regards, Adarsh K S

2

2 Answers

2
votes

You can connect to hbase from spark multiple ways

Hortonworks SHC read hbase directly to dataframe using user defined catalog whereas hbase-rdd read it as rdd and can be converted to DF using toDF method. hbase-rdd has bulk write option (direct write HFiles) preferred for massive data write.

1
votes

What you need is a library that enables spark to interact with hbase. Horton Works' shc is such an extension:

https://github.com/hortonworks-spark/shc