I am trying to integrate SparkStreaming with HBase. I am calling following APIs to connect to HBase:
HConnection hbaseConnection = HConnectionManager.createConnection(conf);
hBaseTable = hbaseConnection.getTable(hbaseTable);
Since I cannot get the connection and broadcast the connection each API call to get data from HBase is very expensive. I tried using JavaHBaseContext (JavaHBaseContext hbaseContext = new JavaHBaseContext(jsc, conf)) by using hbase-spark library in CDH 5.5 but I cannot import the library from maven. Has anyone been able to successfully resolve this issue.
I am trying to use the latest APIs to connect HBase and SparkStreaming on Cloudera.
Some of the JIRA items mentioned here.
http://blog.cloudera.com/blog/2015/08/apache-spark-comes-to-apache-hbase-with-hbase-spark-module/
I am using JavaHBaseContext hbaseContext = new JavaHBaseContext(jssc.sparkContext(), conf);
Then called bulk Get API
hbaseContext.streamBulkGet(TableName.valueOf(tableName), 2, lines, new GetFunction2(), new ResultFunction());
But this bulk API is invoked during initialization not during each streaming message. Also used:
hbaseContext.foreachPartition(jDStream,new VoidFunction<Tuple2<Iterator<String>, Connection>>() {
public void call(Tuple2<Iterator<String>, Connection> t)throws Exception { ...}
The API exists but somehow it is not working for streaming message. Also tried hbaseContext.streamMap(jdstream, new Function<Tuple2<Iterator<String>, Connection>, Iterator<String>>() but it is not working either.
Do we have an example of how to get data using the spark streaming API.