I have a table with history data which is more than a TB size and I would be receiving delta (updated info) records on daily basis which will be in GB size and stored in delta table. Now I want to compare the delta records with the history records and update the History table with the latest data from Delta table.
What is the best approach to do this in Hive since I would be dealing with millions of rows. I have searched the web and found the below approach.
http://hortonworks.com/blog/four-step-strategy-incremental-updates-hive
But I don't think it would a be best approach in the aspect of performance.