How can I skip HBase rows that are missing specific columns?

Question

I'm writing a mapreduce job over HBase using table mapper. I want to skip rows that don't have specific columns. For example, if the mapper reads from the "meta" family, "source" qualifier column, the mapper should expect something to be in that column. I know I can add columns to the scan object, but I expect this merely limits which rows can be seen by the scan, not which columns need to be there.

What filter can I use to skip rows without the columns I need?

Also, the filter concept itself is a little strange. Does the filter operate on a row-by-row basis or a keyvalue-by-keyvalue basis? Does "filter a row" mean skip the row or include it, or simply put it through a filter?

Is there somewhere where this is explained more clearly than the hbase javadocs?

Ram Ghadiyaram Ram Ghadiyaram · Accepted Answer · 2015-07-11T17:50:16

//to skip columns with Column Prefix
Filter columnFilter = new ColumnPrefixFilter(Bytes.toBytes("col-1"));
 //To skip the values
Filter valueFilter= new ValueFilter(CompareFilter.CompareOp.NOT_EQUAL,
      new BinaryComparator(Bytes.toBytes("yourvalue")));

 To Avoid the multiple column names you can pass multiple column filter with must pass all option(which is default)
Below is sample with single column filter.

Filter avoidColumnNamesFilter = new SkipFilter(columnFilter);
scan.setFilter(avoidColumnNamesFilter)
Similarly to avoid certain value pass valuefilter to skip filter

How can I skip HBase rows that are missing specific columns?

2 Answers