1
votes

I am new to Apache Hbase and I am using hbase-0.98.13 and I have created a table sample with column family sample_family. And I have loaded the output from pig script to hbase table. when I try to scan the table based on one of the column in column family it takes more than 2 minutes.

Here is the query

scan 'sample', {FILTER=>"SingleColumnValueFilter('sample_family','id',=,'binary:1000')"}

Can any one tell me how to bring this process in one or two seconds?

Is there any configuration changes to be made for this? Can any one help me in this?

1
Querying values in HBase cells is not the most performance part of HBase engine. As @Matik said you should properly design your row keys to achieve maximum HBase performance.maxteneff
@maxteneff. I have a doubt whether row key value must be unique?wazza

1 Answers

2
votes

There's no silver bullet to make a search in HBase fast. A scan in your example has to iterate over all the rows in a table, that's why it takes significant time on large tables. And there are no secondary indices in HBase that help to improve a search by specific columns.

The most effective way to improve scans perfomance is to have properly designed row keys. HBase internally keeps rows sorted by row keys, and you can specify start and end rows for a scan. So it's crucial to have row keys designed for search by the most frequent criteria. In your question you search by column id where a value is 1000. You could put this id into the row key (however, you have to make sure you avoid regions hotspotting).