3
votes

I have a table with multiple columns in HBase. The structure of the table is something like this:

row1 column=cf:c1, timestamp=xxxxxx, value=v1
row1 column=cf:c2, timestamp=xxxxxx, value=v2
row1 column=cf:c3, timestamp=xxxxxx, value=v3
...

I want to write a custom filter which can filter the value in a certain column. For example, if the value v3 in the column c3 exists, I want to include the whole row, otherwise drop it. As far as I understand, the HBase filter is based on the cell, which will include/skip just one column. I wonder if there is a type of filter in Hbase that can do the filtering like I want? And how should I implement it?

Thanks.

2

2 Answers

3
votes

You could use SingleColumnValueFilter for this problem. Using your example, you could do this:

SingleColumnValueFilter filter = new SingleColumnValueFilter(Bytes.toBytes("cf"), Bytes.toBytes("c3"), CompareFilter.CompareOp.EQUAL, Bytes.toBytes("v3"));

Then, you can add the filter to your scan this way:

Scan scan = new Scan();
scan.setFilter(filter);

Also, if you wanted to have multiple filters you can do that too. Just make sure to add them to a FilterList and pass it to your scan (using the setFilter method).

SingleColumnValueFilter f1 = new SingleColumnValueFilter(Bytes.toBytes("cf"), Bytes.toBytes("c3"), CompareFilter.CompareOp.EQUAL, Bytes.toBytes("v3"));
SingleColumnValueFilter f2 = new SingleColumnValueFilter(Bytes.toBytes("cf"), Bytes.toBytes("c2"), CompareFilter.CompareOp.EQUAL, Bytes.toBytes("v2"));

FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ONE); //could be FilterList.Operator.MUST_PASS_ALL instead
filterList.addFilter(f1);
filterList.addFilter(f2);

Scan scan = new Scan();
scan.setFilter(filterList);
1
votes

You could use SingleColumnValueFilter for both single and multiple conditions. For your case,if you need to exactly match qualifier(field) value you can try below answer:

scan '<table_name>',{FILTER=>"SingleColumnValueFilter('cf','c3',=,'binary:v3')",COLUMNS=>['cf']}

In-case for multiple columns conditions,here is the syntax:

scan '<table_name>',{FILTER=>"SingleColumnValueFilter('<column_family>','<column_qualifier>',<comp_operator>,'binary:<qualifier_value>') AND SingleColumnValueFilter('<column_family>','<column_qualifier>',<comp_operator>,'binary:<qualifier_value>')",COLUMNS=>['column_family']}