We are trying to create a HIVE table which will include a sequence that will act as a unique identifier. This table will have data appended to it daily and we would like to keep the sequence running.
For the initial load we used 'org.apache.hadoop.hive.contrib.udf.UDFRowSequence' this will obviously not be enough for future keys as it always starts from 0.
We thought about storing the max key from the table before every run and then add it to the sequence on every load operation but can't seem to make it work in HIVE. The problem we are facing is we cannot save the result of a query into a variable
Ideally it would do the following (pseudo code):
create temporary function row_sequence as 'org.apache.hadoop.hive.contrib.udf.UDFRowSequence';
store the current table max sequence into a variable
currentMaxSequence = "Select max(sequenceKey) as max from justAnyTable"
Append the max sequence+ current sequence to the new rows
INSERT INTO TABLE anotherTable SELECT currentMaxSequence + row_sequence() as sk, name from (SELECT name from test limit 10) n;
any ideas how this can be done, specifically how can we store a result set from a query (one value) and use it in another one, Hive doesn't seem to like this nested queries.
Thanks, Roy