1
votes

Hy,

Hbase allows a column family to have different qualifiers in different rows. In my case a column family has the following specification

abc[cnt] # where cnt is an integer that can be any positive integer

what I want to achieve is to get all the data from a different column family, only if the value of the described qualifier (in a different column family) matches.

for narrowing the Scan down I just add those two families I need for the query. but that is as far as I could get for now.

I already achieved the same behaviour with a SingleColumnValueFilter, but then the qualifier was known in advance. but for this one the qualifier can be abc1, abc2 ... there would be too many options, thus too many SingleColumnValueFilter's.

Then I tried using the ValueFilter, but this filter only returns those columns that match the value, thus the wrong column family.

Can you think of any way to achieve my goal, querying for a value within a dynamically created qualifier in a column family and returning the contents of the column family and another column family (as specified when creating the Scan)? preferably only querying once.

Thanks in advance for any input.

UPDATE: (for clarification as discussed in the comments)

in a more graphical way, a row may have the following:

colfam1:aaa
colfam1:aab
colfam1:aac
colfam2:abc1
colfam2:abc2

whereas I want to get all of the family colfam1 if any value of colfam2 has e.g. the value x, with regard to the fact that colfam2:abc[cnt] is dynamically created with cnt being any positive integer

1
Your explanation is rather confusing and it looks like you are trying to bend HBase to do things it wasn't meant for. Can you make your question more specific? What is "the qualifier" that you refer to? Am I guessing correctly? You have two column families "abc[1]" and "abc[2]" both with dynamic qualifiers. If "abc[1]:q1" has value "x", then get all qualifiers from "abc[2]" ?André Staltz
I updated the question, I hope that clarifies my questiondivadpoc
I'm still trying to discover what you goal is. Another guess: if some (any) qualifier in "colfam2" has value "x", then get all qualifiers from "colfam1". Is it this?André Staltz
yes, exactly. if "abc1" as value has "x" then I want all qualifiers from "colfam1", the family "colfam1" in my result.divadpoc
Ok, soon we're getting there. Is it: if "abc1" has value "x"? Or is it: if "abc1" or "abc2" or "abc3" or etc has value "x"?André Staltz

1 Answers

3
votes

I see two approaches for this: client-side filtering or server-side filtering.

Client-side filtering is more straightforward. The Scan adds only the two families "colfam1" and "colfam2". Then, for each Result you get from scanner.next(), you must filter according to the qualifiers in "colfam2".

byte[] queryValue = Bytes.toBytes("x");
Scan scan = new Scan();
scan.addFamily(Bytes.toBytes("colfam1");
scan.addFamily(Bytes.toBytes("colfam2");
ResultScanner scanner = myTable.getScanner(scan);
Result res;
while((res = scanner.next()) != null) {
   NavigableMap<byte[],byte[]> colfam2 = res.getFamilyMap(Bytes.toBytes("colfam2"));
   boolean foundQueryValue = false;
   SearchForQueryValue: while(!colfam2.isEmpty()) {
       Entry<byte[], byte[]> cell = colfam2.pollFirstEntry();
       if( Bytes.equals(cell.getValue(), queryValue) ) {
           foundQueryValue = true;
           break SearchForQueryValue;
       }
   }
   if(foundQueryValue) {
       NavigableMap<byte[],byte[]> colfam1 = res.getFamilyMap(Bytes.toBytes("colfam1"));
       LinkedList<KeyValue> listKV = new LinkedList<KeyValue>();
       while(!colfam1.isEmpty()) {
           Entry<byte[], byte[]> cell = colfam1.pollFirstEntry();
           listKV.add(new KeyValue(res.getRow(), Bytes.toBytes("colfam1"), cell.getKey(), cell.getValue()); 
       }
       Result filteredResult = new Result(listKV);
   }
}

(This code was not tested)

And then finally filteredResult is what you want. This approach is not elegant and might also give you performance issues if you have a lot of data in those families. If "colfam1" has a lot of data, you don't want to transfer it to the client if it will end up not being used if value "x" is not in a qualifier of "colfam2".

Server-side filtering. This requires you to implement your own Filter class. I believe you cannot use the provided filter types to do this. Implementing your own Filter takes some work, you also need to compile it as a .jar and make it available to all RegionServers. But then, it helps you to avoid sending loads of data of "colfam1" in vain. It is too much work for me to show you how to custom implement a Filter, so I recommend reading a good book (HBase: The Definitive Guide for example). However, the Filter code will look pretty much like the client-side filtering I showed you, so that's half of the work done.