Reading data from HBase through Spark Streaming

Question

So my project flow is Kafka -> Spark Streaming ->HBase

Now I want to read data again from HBase which will go over the table created by the previous job and do some aggregation and store it in another table in different column format

Kafka -> Spark Streaming(2ms)->HBase->Spark Streaming (10ms)->HBase

Now I don't know how to read data from HBase using Spark Streaming. I found a Cloudera Lab Project that is SparkOnHbase(http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/) library, but I can't figure out how to get a inputDStream for stream processing from HBase. Please provide any pointers or library links if there are any which will help me do this.

John Leach John Leach · Accepted Answer · 2016-08-03T20:47:57

Splice Machine (Open Source) has a demo showing spark streaming running.

http://community.splicemachine.com/category/tutorials/data-ingestion-streaming/

Here is sample code for this use case.

https://github.com/splicemachine/splice-community-sample-code/tree/master/tutorial-kafka-spark-streaming

Reading data from HBase through Spark Streaming

2 Answers