Hbase: Scan with column filter(Get rows which does have a particular column)

Question

I am trying to fetch rows using scan . I need those rows where a particular column is not present. I have tried multiple approaches but none seems to be working.

Let say I want rows where column "fs" is not present. I have tried the below:-

SingleColumnValueFilter filter1 = new SingleColumnValueFilter(
                       "f".getBytes(),
                       Bytes.toBytes("fs"),
                       CompareOp.NOT_EQUAL,
                       Bytes.toBytes(1)
                       );

Assuming if "fs" is present it will have value 1.

This does not work. Also tried what is mentioned here How can I skip HBase rows that are missing specific columns? but that too din worked.

suggestion with SkipFilter was wrong, it filters entire row if any cell does not match filter condition. I guess your rows with 'fs' column have some other columns too and get filtered. — AdamSkywalker

maxteneff maxteneff · Accepted Answer · 2016-12-28T19:52:05

The suggestion with SkipFilter in this answer is not wrong but not applicable for your case (as @AdamSkywalker pointed out).

But you can create two separate SkipFilters on top of ColumnRangeFilters for ranges ["0", "fs") and ("fs", "z"]. And these filters should be combined with FilterList and MUST_PASS_ONE FilterList's combination rule.

Example code which can be tested in HBase shell:

import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.filter.ColumnRangeFilter
import org.apache.hadoop.hbase.filter.SkipFilter
import org.apache.hadoop.hbase.filter.FilterList
import org.apache.hadoop.hbase.filter.FilterList.Operator
scan 'table', {FILTER => FilterList.new(FilterList::Operator::MUST_PASS_ONE,SkipFilter.new(ColumnRangeFilter.new(Bytes.toBytes("0"), true, Bytes.toBytes("fs"), false)),SkipFilter.new(ColumnRangeFilter.new(Bytes.toBytes("fs"), false, Bytes.toBytes("z"), true)))}

In Java API code your filter should look like that:

SkipFilter range1 = new SkipFilter(new ColumnRangeFilter(Bytes.toBytes("0"), true, Bytes.toBytes("fs"), false));
SkipFilter range2 = new SkipFilter(new ColumnRangeFilter(Bytes.toBytes("fs"), false, Bytes.toBytes("z"), true))
FilterList filter = new FilterList(FilterList.Operator.MUST_PASS_ONE, range1, range2)

Note that in this example column name range is limited to printable symbols only. If you use byte arrays as column names you should define wider range.

Hbase: Scan with column filter(Get rows which does have a particular column)

1 Answers