0
votes

I have created a hbase table

create 'user_data_table','personal_data','professional_data';

Then I inserted few records into the table as

put 'user_data_table','user1','personal_data:Location','IL'
put 'user_data_table','user1','personal_data:FName','Deb'
put 'user_data_table','user1','personal_data:LName','D'
put 'user_data_table','user1','professional_data:dept','IT'
put 'user_data_table','user1','professional_data:salary','2000'

put 'user_data_table','user2','personal_data:FName','CH'
put 'user_data_table','user2','personal_data:LName','AK'
put 'user_data_table','user2','professional_data:dept','IT'
put 'user_data_table','user2','professional_data:salary','80000'

I created a snapshot as snapshot 'user_data_table', 'snapshot-day-1'

Then I inserted/updated the record as below.

put 'user_data_table','user1','personal_data:Location','VA'
put 'user_data_table','user1','professional_data:salary','3000'

When I try to refer the snapshot in my hive table, I am not getting the old data. Instead I am getting the latest data everytime. Any idea why its behaving like this. The command to create the hive table using hbase snapshot reference is as below.

CREATE EXTERNAL TABLE if not exists hbase_user_data_snapshot1_table(key string, Location string,FName string,LName string, dept string,salary string) 
    STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
    WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,personal_data:Location,personal_data:FName,personal_data:LName,professional_data:dept,professional_data:salary",
    "hive.hbase.snapshot.name"="snapshot-day-1")
    TBLPROPERTIES ("hbase.table.name" = "user_data_table");
3

3 Answers

0
votes

The snapshot implies that (1) no information will be deleted from existing HFiles and (2) the content of these HFiles as-of-snapshot-creation can be rebuilt on demand (hiding whatever has been appended)

But HIVE-6584 states that...

Bypassing the online region server API provides a nice performance boost for the full scan

...so maybe they chose to "bypass" the point-in-time-recovery part, and just used the snapshot as a backdoor for direct access to the HFiles. Including whatever has been appended since snapshot creation. Maybe.

0
votes

The DDL was wrong. The correct way to do so is as follows.

    CREATE EXTERNAL TABLE if not exists hbase_user_data_snapshot2_table(key string, Location string,FName string,LName string, dept string,salary string) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,personal_data:Location,personal_data:FName,personal_data:LName,professional_data:dept,professional_data:salary")
TBLPROPERTIES ("hive.hbase.snapshot.name"="snapshot-day-2");

Note the TBLPROPERTIES. We dont refer the table instead refer the snapshot name.

0
votes

you need set Hive Variables before select like this

CREATE EXTERNAL TABLE if not exists hbase_user_data_snapshot2_table(key string, Location string,FName string,LName string, dept string,salary string) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,personal_data:Location,personal_data:FName,personal_data:LName,professional_data:dept,professional_data:salary")
TBLPROPERTIES ("hive.hbase.table.name"="xxx");

-- a table may be have many snapshot,so we configure it before select,
-- that is make sense
-- and if you snapshot file is store in a special path, please use 
-- SET hive.hbase.snapshot.restoredir= xxxx; to configure 
SET hive.hbase.snapshot.name=snapshot-day-2;
select * from hbase_user_data_snapshot2_table;