I have a HBase table with about 50 million rows and each row has several columns. My goal is to retrieve from the table those rows who have a given value in a given column, e.g. rows whose column 'col_1' has value 'val_1'.
I have two options to choose:
- scan through the table from the beginning to the end, and check each row and see if it should be retrieved or not;
- build indices for this table (e.g., indices for values in column 'col_1'), then for a given column value 'val_1', get all the row keys associated with this index 'val_1', then go through these row keys and retrieve the corresponding rows.This in my mind will involve random access to the original hbase table.
Does anyone give me some suggestions about which option runs faster, or you have another better option?
Thanks a lot!