0
votes

When i do an hbase scan with a SingleColumnValueFilter, and no other parameters added it returns me 40000 rows.

Example: table.scan(filter="SingleColumnValueFilter('info','collection',=,'substring:tweets_brazilFire')")

Example: table.scan(filter="SingleColumnValueFilter('info','collection',=,'substring:tweets_brazilFire')", columns=['field:body_s'])

When i add the columns to the scan, it results 1967178 rows.

I am confused here. The column is present in other rows but it did not have the column value as i am specifying. Shouldnt the scan apply both conditions of returning columns and row only that pass the filter ?

I am using python happybase for this.

Please let me know your suggestions.

Thanks

1

1 Answers

0
votes

First from the API point of view, the Apache Thrift applies SingleColoumnValueFilter with another syntax

Syntax: SingleColumnValueFilter(<compare operator>, ‘<comparator>’, ‘<family>’, ‘<qualifier>’,<filterIfColumnMissing_boolean>, <latest_version_boolean>)

Syntax: SingleColumnValueFilter(<compare operator>, ‘<comparator>’, ‘<family>’, ‘<qualifier>)

Example: "SingleColumnValueFilter (<=, ‘abc’,‘FamilyA’, ‘Column1’, true, false)"

Example: "SingleColumnValueFilter (<=, ‘abc’,‘FamilyA’, ‘Column1’)"

The first syntax seems appropriate for you. filterIfColumnMissing_boolean as true, so that only those rows with such a coloumn exist will return. Versions is upto you. Hopes this helps.

thanks