I have an original table that is partitioned on YEAR, MONTH and DATE. For example:
col_1 col_2 col_3 YEAR MONTH DATE
a b c 2017 03 25
I want to create a new table which is a subset of this table, but still maintains the partitioning of the original table.
Something as simple as
CREATE new_table AS
SELECT *
FROM original_table
WHERE (conditions);
However, because the original table is so big, I have to iterate through this query via the partitions.
My current solution is to write a shell script that iterates through all partitions and runs a separate query for each partition.
Example:
for year in '2016' '2017'
do
for month in '01' '02' ...
do
for day in '01' '02' ...
do
hive -e "INSERT INTO new_table SELECT * FROM original_table WHERE YEAR=$year AND MONTH=$month etc."
done
done
done
But this appears very round-about and inefficient. Is there a way to do this as one line directly in hive?
WHERE year BETWEEN '2016' AND '2017', for example within the query? - OneCricketeer