0
votes

I am new to HBase. I understand the HBase is not equivalent to the RDBMS. However, I like run simple query in the HBase that is very simple in the RDBMS. I tried using Scan with Filter but I don't know how to get the column by using value.

Consider this simple MySQL query : "SELECT username FROM members WHERE email = [email protected]"

Same in the HBase, I have table name : members. And I have two columns : username and email.

Now, I want to extract the username where the email is equal to the [email protected].

I found so many examples that can extract the value when you specify the column family and qualifier. But my case is different, when I think in the RDBMS it's super easy, but I don't how to think in the HBase. (I know there are some SQL wrapper available to use SQL over HBase, but I want without it).

Thanks in advance

1

1 Answers

0
votes

You have 2 methods of retrieving data:

  • By accessing it by it's row key: GET (fast)
  • By scanning some or all the rows: SCAN (slow)

Each table cell is identified by:

  • Row key + column family + column + timestamp + value

The row key can be anything that can be serialized into a byte[] (an integer, a string...)

In a RDBMS, you'll probably have an index for the email field, that way, the engine can easily find the position of the row without scanning the whole table, making the retrieval fast.

In HBase there are no indexes, you only have the row key. A common technique is to denormalize data (write it multiple times with different row-keys) or build manual indexes: i.e. think about storing the md5 of the email as row key with a column pointing to the row key of the real data, that way, you can retrieve data by email with 2 GET operations (1. ask your email-index-table for the row-key of the email, 2. ask the user table for that row-key).


Now, regarding your question, I think you're looking for the ValueFilter, which is just like the SingleColumnValueFilter but you don't need to provide the column, it will match any column with the value. Anyway, I don't think it's a very used filter, after a couple of years working with HBase I've always used SingleColumnValueFilter instead...

Here you can find a good list of filters: http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/admin_hbase_filtering.html