0
votes

I have created an hbase table with Phoenix SQL create table query and also specified salt_buckets. Salting adds prefix to the the rowkey as expected.

I have created an external hive table to map to this hbase table with hbase serde The problem is when I query this table by filtering on rowkey:

where key = "value"

it doesn't work because I think salt pre-fix is also getting fetched for the key. This limits the ability to filter the data on key. The option:

"where rowkey like "%value"

works but it takes a long time as likely does the entire table scan.

My question is how can I query this table efficiently on row key values in hive (strip off salt pre-fix)?

1

1 Answers

0
votes

Yes you're correct while mentioning

it doesn't work because I think salt pre-fix is also getting fetched for the key. '

One way to mitigate is to use hashing instead of random prefix. And prefix the rowkey with the calculated hash Using this technique you can calculate hash for the rowkey you want to scan for.:

mod(hash(rowkey),n) where n is the number of regions will remove the hotspotting issue

Using random prefix brings in the problem you mentioned in your question.

The option: "where rowkey like "%value" works but it takes a long time as likely does the entire table scan.

This is exactly what random prefix salting does. HBase is forced to scan the whole table to get the required value, so it would be better if you could prefix your rowkey with its calculated Hash. But this hashing technique wont prove good in Range scans.

Now you may ask, why cant I simply replace my rowKey with its Hash and store the rowkey as separate column. It may/may not work, but I would recommend implementing it this way because HBase is already very sensitive when it comes to Column Families. But then again I am not clear on this solution.

You also might want to read this for more detailed explanation.