HBase Snapshot in Hive table

Question

I have created a hbase table

create 'user_data_table','personal_data','professional_data';

Then I inserted few records into the table as

put 'user_data_table','user1','personal_data:Location','IL'
put 'user_data_table','user1','personal_data:FName','Deb'
put 'user_data_table','user1','personal_data:LName','D'
put 'user_data_table','user1','professional_data:dept','IT'
put 'user_data_table','user1','professional_data:salary','2000'

put 'user_data_table','user2','personal_data:FName','CH'
put 'user_data_table','user2','personal_data:LName','AK'
put 'user_data_table','user2','professional_data:dept','IT'
put 'user_data_table','user2','professional_data:salary','80000'

I created a snapshot as snapshot 'user_data_table', 'snapshot-day-1'

Then I inserted/updated the record as below.

put 'user_data_table','user1','personal_data:Location','VA'
put 'user_data_table','user1','professional_data:salary','3000'

When I try to refer the snapshot in my hive table, I am not getting the old data. Instead I am getting the latest data everytime. Any idea why its behaving like this. The command to create the hive table using hbase snapshot reference is as below.

CREATE EXTERNAL TABLE if not exists hbase_user_data_snapshot1_table(key string, Location string,FName string,LName string, dept string,salary string) 
    STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
    WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,personal_data:Location,personal_data:FName,personal_data:LName,professional_data:dept,professional_data:salary",
    "hive.hbase.snapshot.name"="snapshot-day-1")
    TBLPROPERTIES ("hbase.table.name" = "user_data_table");

Samson Scharfrichter Samson Scharfrichter · Accepted Answer · 2015-08-13T22:52:32

The snapshot implies that (1) no information will be deleted from existing HFiles and (2) the content of these HFiles as-of-snapshot-creation can be rebuilt on demand (hiding whatever has been appended)

But HIVE-6584 states that...

Bypassing the online region server API provides a nice performance boost for the full scan

...so maybe they chose to "bypass" the point-in-time-recovery part, and just used the snapshot as a backdoor for direct access to the HFiles. Including whatever has been appended since snapshot creation. Maybe.

HBase Snapshot in Hive table

3 Answers