I might have a gap in understanding hive partitioning. I have an external table that is partitioned by date. I'm generating the parquet files via a query on a managed hive table. I currently run a bash script to process incrementally by date (which gets supplied to the query). Is there a better way to process multiple dates?
Table is created this way:
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
CREATE EXTERNAL TABLE my_table (id int, product string)
PARTITIONED BY (dt string)
LOCATION '/path/to/location'
My bash script essentially iterates dates and inserts data via:
INSERT OVERWRITE TABLE my_table (dt='2020-07-09')
SELECT id, product FROM managed_table WHERE dt = '2020-07-09';
So this works, but it's only doing it a day at a time which doesn't seem very efficient. Is there a better way?