3
votes

I want to scan rows in a HTable from the HBase shell using row matching some pattern.

For example, I have the following table data:

    row:r1_t1  column:cf:a, timestamp=1461911995948,value=v1
    row:r2_t2  column:cf:a, timestamp=1461911995949,value=v2
    row:s1_t1  column:cf:a, timestamp=1461911995950,value=q1
    row:s2_t2  column:cf:a, timestamp=1461911995951,value=q2

Based on the above data I want to find the rows that contain 't1' :

    row:r1_t1  column:cf:a, timestamp=1461911995948,value=v1
    row:s1_t1  column:cf:a, timestamp=1461911995950,value=q1

I know I can scan the table with PrefixFilter, but this method takes the rows that starts with the specified filter.

    scan 'test', {FILTER => "(PrefixFilter('s')"}

Is there a similar way of scanning the table based on filtering the rows with the pattern matching in the middle of the row name?

1
Did you find any alternative techniques other than what I suggested below ?Ram Ghadiyaram
No, the method with RowFilter works fine, so I've stopped looking for an alternative.Adrian Muntean

1 Answers

5
votes
hbase(main):003:0> scan 'test', {ENDROW => 't1'}

In general, Using a PrefixFilter can be slow because it performs a table scan until it reaches the prefix.

Also can use RowFilter with SubstringComparator like below

Can use RowFilter with SubstringComparator like below

hbase(main):003:0> import org.apache.hadoop.hbase.filter.CompareFilter
hbase(main):005:0> import org.apache.hadoop.hbase.filter.SubstringComparator
hbase(main):006:0> scan 'test', {FILTER => org.apache.hadoop.hbase.filter.RowFilter.new(CompareFilter::CompareOp.valueOf('EQUAL'),SubstringComparator.new("searchkeyword"))}