HBase performance with large number of dynamically generated column qualifiers (within a column family)

Question

I have a table with 1 column family called 'A'. On runtime, I will insert the (Key-Value) pair to the table. Leave the RowKey away, in my design, Column qualifier is MD5(Key) so, column qualifers are dynamically created, and the cell will contains the corresponding Value.

E.g: Each car has a license plate. I want to insert all to one table in HBase. Car A has rowkey R1, column qualifier is C1, value is License Plate of A. Car B has rowkey R2, , column qualifier is C2, value is License Plate of A, and vice versa. With the schema, When executing Scan command, with rowkey = R1, is cell contained in column qualifier C2 return (in this case, it is definite null)?

I want to ask some questions about performances:

With this schema design, Does Scan command's performance decrease? (I want to scan all values on the table). With each row, is all column will be returned?
With the above requirements, can anyone point me the right way to design this table?

Thank you very much!

ashubhargave ashubhargave · Accepted Answer · 2014-01-09T10:11:18

No, the performance of scan will not decrease.That is the beauty of HBASE.

I have dealt with similar kind of structure and huge data set and the retrieval was amazingly quick.

I think for dealing with such scenario, the different filters in HBASE would help a lot.

You can also refer about HBASE filter's from HBASE:Defenitive guide. One of the good filters in HBASE is the prefix filter. If you are working in JAVA it would look somewhat like this,

Scan s = new Scan();
Filter filter = new PrefixFilter(Bytes.toBytes("car_"+i));
s.setFilter(filter);

Here the rowkeys for different car's can be "car_[liscence number OR car number]".So that even if you want to extract only one row out of lakhs of rows,this can be done in some seconds.

HBase performance with large number of dynamically generated column qualifiers (within a column family)

4 Answers