I am a newbie to hive transactions thats supported since hive 0.14.
I created a transactional table in hive as follows
create table test_transactional(id int,name string) clustered by (id) into 2 buckets stored as orc TBLPROPERTIES('transactional'='true');
I also created a table with some sample data that has id, string columns.
create table stage(id int, name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
Then i inserted the data from the stage table to the test_transactional table as follows,
insert into table test_transactional select * from stage;
I also ran an update statement to see if its working. So far so good.
Questions
When i do an ls on the hdfs folder for test_transactional table, i dont see base folder but only the delta folder. As per the documentation, my understanding is that the base folder should be present. Or base folder would be created only when major compaction is run?
The table gets updated when the update/insert/delete statements are run explicitly, but in a typical ETL, when i have data in the staging table, how do i determine if its an update, insert or delete? Do i need to set some properties so that hive can do this automatically? If not how can this be handled?