I want to do a hbase scan with filters. For example, my table has column family A,B,C, and A has a column X. Some rows have the column X and some do not. How can I implement the filter to filter out all the rows with column X?
3 Answers
I guess you are looking for SingleColumnValueFilter
in HBase. As mentioned in the API
To prevent the entire row from being emitted if the column is not found on a row, use
setFilterIfMissing(boolean)
on Filter object. Otherwise, if the column is found, the entire row will be emitted only if the value passes. If the value fails, the row will be filtered out.
But SingleColumnValueFilter
would want a value to have Column X "CompareOp" to something, say bring this row if ColumnX == "X"
or
bring this row if ColumnX != "A sentinel value that ColumnX can never take" and setFilterIfMissing(true)
so that if ColumnX has some value, it is returned.
I hope this nudges you in the right direction.
You can use a SkipFilter along with ColumnPrefixFilter. The ColumnPrefixFilter gets keys where the column exists (an HBase row will only have a column if it has a value) the Skip filter will give you the "Not" on the first filter so the row will be omitted
Ankit Arnon user1573269
The only way I could get it work, is like below
So - I have a table with columns rule1, rule2 , rule3 and so on. Rows can have only rule1 column, or rule1 and rule2, or rule1 and rule2 and rule3 and so on. Say - I want to extract rows which have ONLY rule1 in them. Now this means, I will have to skip rows which have rule2 in them.
Scan getRules = new Scan();
ColumnPrefixFilter rule1Filter = new ColumnPrefixFilter(Bytes.toBytes("rule1"));
SingleColumnValueFilter skipRule2Value = new SingleColumnValueFilter(Bytes.toBytes("rules"),Bytes.toBytes("rule2"),
CompareOp.EQUAL,Bytes.toBytes("0"));
SkipFilter skipRule2 = new SkipFilter(skipRule2Value);
getRules.setFilter(rule1Filter);
getRules.setFilter(skipRule2);
ResultScanner scanner = htable.getScanner(getRules);
Though this worked, I am not very happy with the solution. Its takes time for hbase to figure out. I would have thought there should be an easier straightforward method which does not have to check the value. Arnon, your method does not work because SkipFilter will skip those which DONOT satisfy the condition. Hence constructing it from a ColumnPrefixFilter fails the requirement.