Hive copy from partitioned table

Question

I have an original table that is partitioned on YEAR, MONTH and DATE. For example:

col_1    col_2    col_3    YEAR    MONTH    DATE 
a        b        c        2017    03       25

I want to create a new table which is a subset of this table, but still maintains the partitioning of the original table.

Something as simple as

CREATE new_table AS 
SELECT * 
FROM original_table 
WHERE (conditions);

However, because the original table is so big, I have to iterate through this query via the partitions.

My current solution is to write a shell script that iterates through all partitions and runs a separate query for each partition.

Example:

for year in '2016' '2017'
    do
    for month in '01' '02' ...
        do
        for day in '01' '02' ...
            do
            hive -e "INSERT INTO new_table SELECT * FROM original_table WHERE YEAR=$year AND MONTH=$month etc."
        done
    done
done

But this appears very round-about and inefficient. Is there a way to do this as one line directly in hive?

You're not inserting into any partitions... Did you try WHERE year BETWEEN '2016' AND '2017', for example within the query? — OneCricketeer

OneCricketeer OneCricketeer · Accepted Answer · 2018-01-21T15:28:38

I recently had to do something like this for a subset of days, but it requires you to copy the schema of the original table, or at least use CREATE TABLE LIKE

But, most importantly, you're insert statement needs to specify the partitions

CREATE TABLE new_table (
    fields... 
)
PARTITIONED BY (year STRING, month STRING, day  STRING);

INSERT OVERWRITE TABLE new_table PARTITION(year, month, day) 
SELECT fields... , year, month, day -- partitions must be last
FROM original_table
WHERE 
year BETWEEN '2016' AND '2017';  -- add more, as necessary

You might also be able to use a CTAS function, but doing those with partitioned tables isn't straightforward

Hive copy from partitioned table

1 Answers