I'm working on implementing rollback operation in hbase. My component is fed with all information to do put (actually there are hundred of such puts) - table, timestamp (might be null), family, qualifier, value. It buffers them, then calls HTable.put() in a batch. Considering the fact that data is not pre-verified any put might fail.
I'm trying to implement the way to roll-back what was already done before the failing put().
As I see there are 3 ways to roll-back put:
- Delete new item (if no such item existed before)
- Do nothing (if exactly the same item (including timestamp) existed before)
- Execute another Put (if new Put changed some data in the old row. NB: I know that in hbase there is no way to change data in place. By 'changed' I refer to the fact that new data was written to the same row/timestamp/family/qualifier, and old one was discarded - as in my setup hbase is instructed to keep only one version of the item).
So the question is - how to distinguish between these 3 puts? Of course it is matter of querying hbase for particular item, but doing plain get/scan for few hundred items seems not very efficient to me.
So I'm looking for some way to do batch get / scan on hbase.