0
votes

I am new to hadoop and big data, just trying to figure out the possibilities to move my Data store to hbase these days, and I have come across a problem, which some of you might be able to help me with. So its like,

I have a hbase table "hbase_testTable" with Column Family : "ColFam1". I have set the version of "ColFam1" to 10, as I have to maintain history upto 10 updates to this column family. Which works fine. When I try to add new rows through hbase shell with explicit timestamp value it works fine. Basically I want to use the timestamp as my version control. So I specify the time stamp as

put 'hbase_testTable' '1001','ColFam1:q1', '1000$', 3

where '3' is my version. And everything works fine.

Now I am trying to integrate with HIVE external table, and I have all mappings well set to match that of hbase table like below :

create external table testtable (id string, q1 string, q2 string, q3 string) 
STOREd BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH 
SERDEPROPERTIES ("hbase.columns.mapping" = ":key,colfam1:q1, colfam1:q2, colfam1:q3") 
TBLPROPERTIES("hbase.table.name" = "testtable", "transactional" = "true");

And works fine with normal insertion. It updates the HBase table and vice-versa.

Even though the external table is made "Transactional", I am not able to update the data on HIVE. It gives me an error :

FAILED: SemanticException [Error 10294]: Attempt to do update or delete
        using transaction manager that does not support these operations

Said that, Any updates, made to the hbase tables are reflected immediately on the hive table.

I can update the Hbase table with hive external table by trying to insert into the hive external table for the "rowid" with new data for the column.

Is it possible to I control the timestamp being written to the referenced hbase table ( like 4,5,6,7..etc) Please help.

1

1 Answers

1
votes

The timestamp is one of important element in Hbase versioning. You are trying to create your own timestamp, which works fine at Hbase level. One point, is you should be very careful, with unique and non-negative. You can look at Custom versioning in HBase-Definitve Guide book.

Now you have Hive on top of Hbase. As per documentation,

there is currently no way to access the HBase timestamp attribute, and queries always access data with the latest timestamp.

Thats for the reading part. And for putting data, you can look here. It still says that, you have to give valid time stamp and not any other value.

The future versions are expected to expose the timestamp attribute. I hope you got a better idea regarding how to deal with custom timestamp in Hive-Hbase integration.