0
votes

This is needed for tables which rows with thousands of cells.

Let's say we have table deviceEvents with deviceID as key and each event is stored as a new column with a name like "event_XPGSGR", "event_whatever".

The requirement is to retrieve the "latest event", ie. the cell one with the most recent timestamp. (or possibly filtering based on the contents of the cell)

Using ColumnRangeFilter, we can filter to retrieve just the columns starting with "event" and the client could look for the event with max(timestamp), but that would mean copying all the events to the client in every call, which is not acceptable.

Is not there a way to do this column filtering in HBase?

Thanks!

1

1 Answers

0
votes

No.

At first I was going to say write your own implementation of the Filter interface that does this. However, if you look at HBase's interface for filterCell , you will see that you cannot. The reason is you have to know when looking at a given cell if you want to keep it or not. Your query depends on scanning all of the data to know the latest.

To accomplish what you desire probably requires a special schema design. For example, whenever you write a column, you could write the column twice, once to it's column name and once to 'latest' (if it's the latest). This would allow for constant time lookups of the 'latest'. The trade-off is you have to compute the latest on write, so instead of quadratic read you have quadratic write (assuming you have to compare against all existing values).