We are working on a project where we are using HBase as operational data store; all data is coming to hbase in real time. And, during every 2 hour, the data in Hbase needs to be synced to Hive. This is to enable analytical queries to run on top of latest data.
For syncing data from Hbase to Hive:
For insert/update only scenarios, I can use the timestamp column provided by hbase to know the inserted/updated records. For "DELETE" scenarios, I am struggling to find the right approach.
Does HBase Scan API provides any option to do that ?
Or should I go with any SQL options like Apache Phoenix for doing the same ?