3
votes

I want to do a full Scan on hbase from Spark 2 using Scala.

I don't have a fixed catalog definition so libraries as SHC are not an option.

My logical choice was to use hbase-spark, that is working fine in Spark 1.6

In addition to the poor documentation about this library in previous versions, my surprise has been when checking the last HBase releasees, for example tag 2.0, hbase-spark is gone! but still in the master.

So my questions are:

  • Where is the hbase-spark module for the last releases?
  • Where can I find a hbase-spark version compatible with Spark 2?

thx!

2

2 Answers

1
votes

Seems hbase-spark module was removed from the hbase project for v2.0 release

https://issues.apache.org/jira/browse/HBASE-18817

0
votes

@bp2010 already answered part of the question.

Regarding the HBase Spark see below. It works with spark 2.

There are some options that don't demand a fixed catalog from client code:

  1. HBase Spark Source code with examples are here: https://github.com/apache/hbase-connectors/tree/master/spark/hbase-spark Here you can see explanations about the repositories: https://github.com/apache/hbase-connectors/tree/master/spark/hbase-spark

  2. Apache Phoenix Spark connector https://phoenix.apache.org/phoenix_spark.html

I'm not sure if it helps you, since the table must be mapped to a Phoenix table. If you have Phoenix, and you problem is writing the catalog from code, but you can standardize types in HBase Table, for a full scan this can be the way to go. Otherwise, go with option 1.