38
votes

I want to scan rows in a HTable from hbase shell where a column family (i.e., Tweet) has a particular value (i.e., user_id).

Now I want to find all rows where tweet:user_id has value test1 as this column has value 'test1'

column=tweet:user_id, timestamp=1339581201187, value=test1

Though I can scan table for a particular using,

scan 'tweetsTable',{COLUMNS => 'tweet:user_id'}

but I did not find any way to scan a row for a value.

Is it possible to do this via HBase Shell?

I checked this question as well.

7

7 Answers

48
votes

It is possible without Hive:

scan 'filemetadata', 
     { COLUMNS => 'colFam:colQualifier', 
       LIMIT => 10, 
       FILTER => "ValueFilter( =, 'binaryprefix:<someValue.e.g. test1 AsDefinedInQuestion>' )" 
     }

Note: in order to find all rows that contain test1 as value as specified in the question, use binaryprefix:test1 in the filter (see this answer for more examples)

27
votes

Nishu, here is solution I periodically use. It is actually much more powerful than you need right now but I think you will use it's power some day. Yes, it is for HBase shell.

import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
import org.apache.hadoop.hbase.util.Bytes

scan 'yourTable', {LIMIT => 10, FILTER => SingleColumnValueFilter.new(Bytes.toBytes('family'), Bytes.toBytes('field'), CompareFilter::CompareOp.valueOf('EQUAL'), Bytes.toBytes('AAA')), COLUMNS => 'family:field' }

Only family:field column is returned with filter applied. This filter could be improved to perform more complicated comparisons.

Here are also hints for you that I consider most useful:

15
votes

As there were multiple requests to explain this answer this additional answer has been posted.

Example 1

If

scan '<table>', { COLUMNS => '<column>', LIMIT => 3 }

would return:

ROW     COLUMN+CELL
ROW1    column=<column>, timestamp=<timestamp>, value=hello_value
ROW2    column=<column>, timestamp=<timestamp>, value=hello_value2
ROW3    column=<column>, timestamp=<timestamp>, value=hello_value3

then this filter:

scan '<table>', { COLUMNS => '<column>', LIMIT => 3, FILTER => "ValueFilter( =, 'binaryprefix:hello_value2') AND ValueFilter( =, 'binaryprefix:hello_value3')" }

would return:

ROW     COLUMN+CELL
ROW2    column=<column>, timestamp=<timestamp>, value=hello_value2
ROW3    column=<column>, timestamp=<timestamp>, value=hello_value3

Example 2

If not is supported as well:

scan '<table>', { COLUMNS => '<column>', LIMIT => 3, FILTER => "ValueFilter( !=, 'binaryprefix:hello_value2' )" }

would return:

ROW     COLUMN+CELL
ROW1    column=<column>, timestamp=<timestamp>, value=hello_value
ROW3    column=<column>, timestamp=<timestamp>, value=hello_value3
8
votes

An example of a text search for a value BIGBLUE in table t1 with column family of d:a_content. A scan of the table will show all the available values :-

scan 't1'
...
column=d:a_content, timestamp=1404399246216, value=BIGBLUE
...

To search just for a value of BIGBLUE with limit of 1, try the below command :-

scan 't1',{ COLUMNS => 'd:a_content', LIMIT => 1, FILTER => "ValueFilter( =, 'regexstring:BIGBLUE' )" }

COLUMN+CELL
column=d:a_content, timestamp=1404399246216, value=BIGBLUE

Obviously removing the limit will show all occurrences in that table/cf.

1
votes

To scan a table in hbase on the basis of any column value, SingleColumnValueFilter can be used as :

scan 'tablename' ,
   { 
     FILTER => "SingleColumnValueFilter('column_family','col_name',>, 'binary:1')" 
   } 
0
votes

From HBAse shell i think it is not possible because it is some how like query from which we use want to find spsecific data. As all we know that HBAse is noSQL so when we want to apply query or if we have a case like you then i think you should use Hive or PIG where as Hive is quiet good approach because in PIG we need to mess with scripts.
Anyway you can get good guaidence about hive from here HIVE integration with HBase and from Here
If yout only purpose is to view data not to get from code (of any client) then you can use HBase Explorer or a new and very good product but it is in its beta release is "HBase manager". You can get this from HBase Manager
Its simple, and more importantly, it helps to insert and delete data, applying filters on column qualifiers from UI like other DBclients. Have a try.
I hope it would be helpful for you :)

0
votes

Slightly different question but if you you want to query a specific column which is not present in all rows, DependentColumnFilter is your best friend:

import org.apache.hadoop.hbase.filter.DependentColumnFilter
scan 'orgtable2', {FILTER => "DependentColumnFilter('cf1','lan',false,=,'binary:fre')"}

The previous scan will return all columns for the rows in which the lan column is present and for which its associated value is equal to fre. The third argument is dropDependentColumn and would prevent the lan column itself to be displayed in the results if set to true.