0
votes

I have encountered this strange issue when running some test code on a Cloudera-based HBase deployment. Assume these are my row keys (a simplified version of my actual row key structure):

a_1
a_2
a_3
b_1
b_2
b_3
c_1
c_2
c_3

And I run a scan with start, stop= b_2, c_2 (exlusive), I get the rows:

b_2
b_3
c_1

When I add a Fuzzy filter for "?_2" keeping the same start-stop, it seems to ignore start-stop and returns these rows:

a_2
b_2
c_2

whereas I would expect:

b_2

since a_2 and c_2 are out of my scan range.

Now this is where it gets interesting, I installed a separate pseudo-distributed HBase v 2.0.4 on my PC and in this setup it works as expected! The only differences are the HBase version and my installation not working on a cluster.

So I am trying to find why this is happening, and I have a few questions:

  • Am I wrong in my assumption that FuzzyRowFilter should respect the start-stop rows?
  • Could it simply be a bug in my cluster HBase version? (Cloudera)
  • Could it be that FuzzyRowFilter started as a full table scan and later versions evolved it to use the range? Note that I searched for a clue in HBase Jira but could not find an issue about this. Neither could I find any unit test cases for FuzzyRowFilter that checks correctness of the range. Test cases all have full Scan()s with no range.
  • Could it be happening as a result of some cluster-deployment intricacy that I am not aware of. (I don't think so, but..)

Thanks.

1
Please can you edit your question to include the actual queries you're running, and the HBase version that's generating this error?Ben Watson
@BenWatson Since my original code was sort of messy, I coded a small 100-200 line test case that inserts the data and scans it (to upload here). Lo and behold, it works. Then I modified it to use my original rowkey format (which has more moving parts) and it still works. At this point I think I am making a mistake somewhere, but I am still baffled by why my original code works differently on 2 different versions. At least I now have an answer: the filter is supposed to respect the scan range. Still working on this, I hope I can update with an answer. (HBase version is 1.2.0-cdh5.13.2)sydnal

1 Answers

0
votes

Not exactly an answer, but I deployed HBase 2.0.2 to my environment and it now works. I really wish I could find out what was going on, but I couldn't. Maybe it was mismatched server-client version issue caused by stale builds, since I was working with multiple versions at the time. At least I read enough HBase code to answer one of my original questions, filters are supposed to respect scan start-stop rows.