1
votes

My rowKeys in HBase like this;

a1s1
a1s2
a1s3
a2s1
a3s1
a3s2
...

I want to get only these data;

a1s1
a2s1
a3s1

But when I run thise query; scan 't1', {STARTROW=>'a1s1', ENDROW=>'a4s1'}

It gives me;

a1s1
a1s2
a1s3
a2s1
a3s1

But I don't want to get a1s2 and a1s3. How can I do this?

2

2 Answers

1
votes

You should use STARTROW-ENDROW and another filter with RegexStringComparator. If you use only start-end row filter, hbase performs this filtration for each character in your rowkey. Because rowkey is not numeric. In Hbase shell you can try this:

import org.apache.hadoop.hbase.filter.CompareFilter

import org.apache.hadoop.hbase.filter.RegexStringComparator

scan 't1', {STARTROW => 'a1s1', ENDROW => 'a4s1', FILTER => org.apache.hadoop.hbase.filter.RowFilter.new(CompareFilter::CompareOp.valueOf('EQUAL'),RegexStringComparator.new("s1$"))}
0
votes

I assume, you want to get the row key starting with "a*" and ending with "s1".

So either you can use below:

 scan 't1', { ENDROW=>'s1'}

Or

scan 't1', {STARTROW=>'a', ENDROW=>'s1'}

Another option is using regexString:

scan 't1', {FILTER => "RowFilter(=, 'regexstring:*s1')"}