0
votes

Is there a way in Hive SQL to get all partitions updated after a specific time ?

I have hourly partitions based on transaction time. It is very much likely that a transaction record arrives few hours late, and it reaches an older partition.

Table1 : Hourly partitions
00
01
..
23

Table 2 : Daily partitions
After midnight above 24 partitions are aggregated into daily partition in Table2.

After this daily partition was created, a set of transactions arrive late and they are added into Table1 hourly parition. Say they are added in hour=20 partition.

I need to identify what all partitions received late updates in this manner. So if there is a need I will recalculate my daily partition data in Table2.

1

1 Answers

1
votes

I would introduce intermediate table with two columns: day and last modified time. Every time late records arrive, update this table with new rows of last update time.

Table2 can use this table to check if last modified time > job last execution time. If so, do calculation for that day.