Filter rows in HBase based on partial row keys

Question

I have my HBase data with row key as siteid_timestamp.

ROW COLUMN+CELL
001_1454578003995 column=hd:abc, timestamp=1454578173766, value=2

001_1454578003996 column=hd:def, timestamp=1454578173766, value=2

002_1454578003997 column=hd:ijk, timestamp=1454578173766, value=2

002_1454578003998 column=hd:lmn, timestamp=1454578173766, value=2

The siteid can be different. My requirement is to get rows within a timestamp range. This timestamp will be row key without siteid and underscore. I do not want to use hbase timestamp.

So if I ask for a range of timestamp as >=1454578003995 && <=1454578003996, I should get 1st two rows.

Could you please help me with this?

Shyam Shyam · Accepted Answer · 2016-02-07T15:43:54

For this case, We would need to perform a scan with a Filter(s) [1].

Since we have to filter based on rowkey, We could use the RowFilter together with RegexStringComparator. The RegexStringComparator allows us to query/limit using handy regular expressions, But keep in mind that the performance could suffer for large amount of data. Some pseudo code for illustration

   ...
    Filter filter = new RowFilter(CompareFilter.CompareOp.EQUAL,
    new RegexStringComparator("\\d+_12345*"));
    scan.setFilter(filter);
    ...

We could combine multiple filters as well.(see FilterList)

Look into FuzzyRowFilter [3], If you require more efficiency. (It has some constraints regarding the format/structure of rowkey)

Some concrete examples [4].

Hope it helps.

[1] Hbase scan api : https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html

[2] Filters and comparators : https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/package-summary.html

[3] https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FuzzyRowFilter.html

[4] https://github.com/larsgeorge/hbase-book/blob/master/ch04/src/main/java/filters/RowFilterExample.java

Filter rows in HBase based on partial row keys

1 Answers