0
votes

I am using HBase-Hive integration to read and write HBase with Hive, following documentation.

Basically, I create a table in Hive with HBaseStorageHandler like:

CREATE EXTERNAL TABLE hbase.test (
  col1 string,
  col2 map<string, double>
)
STORED BY
  'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
  'hbase.columns.mapping'=':key, cf:',
  'hbase.table.name'='test')

It's working perfectly reading and writing. But now I want to clear some bad data by value. This value is in both row key and column cell, i.e. col1, and key of col2.

I did not find anything related to data deletion in the document. Hopefully, someone has similar experience can answer my question here.

Thanks in advance!

1

1 Answers

1
votes

The nearest use case for your delete is overwrite. You can find it in the documentation you provided.

In general, delete is not easily achieved in big data area. In HBase they are achieved using tombstones and compactions. In Hive, its only available from 0.14 version. And that too for tables that support ACID. Again ACID is supported from 0.13.

As you can see, the work on deletion for bid data sets is only picking recently. You need to plan better options like insert overwrite to handle erasing bad data.

Since you are not using each framework individually - like hbase or hive as standalone, you would not be getting full features for either of them. An integration is only good if you have heavy SQL view and HBase at the back end.

Again, why HBase was chosen at the back end, might have its own requirements. Hope this helps in designing better solutions