How to Scan HBase Rows efficiently

Question

I need to write a MapReduce Job that Gets all rows in a given Date Range(say last one month). It would have been a cakewalk had My Row Key started with Date. But My frequent Hbase queries are on starting values of key.

My Row key is exactly A|B|C|20120121|D . Where combination of A/B/C along with date (in YearMonthDay format) makes a unique row ID.

My Hbase tables could have upto a few million rows. Should my Mapper read all the table and filter each row if it falls in given date range or Scan / Filter can help handling this situation?

Could someone suggest (or a snippet of code) a way to handle this situation in an effective manner?

Thanks -Panks

Why don't you copy the contents of the table to a new one with the key rearranged and scrap the old one? — Mario
@Mario what if the table has a trillion keys? And he needs to do this often? — markgiaconia

Chris Shain Chris Shain · Accepted Answer · 2012-01-23T04:57:58

You can use a RowFilter with a RegexStringComparator. You'd need to come up with a RegEx that filters your dates appropriately. This page has an example that includes setting a Filter for a MapReduce scanner.

How to Scan HBase Rows efficiently

4 Answers