0
votes

I have a program that generates all the data concerning a Impala table partition. This program writes the data in a HDFS Text file.

How to (physically) remove all the data previously belonging to the partition and replace them with the data in the new Text file converted in Parquet format ?

If I physically remove the old Parquet files composing the partition using raw HDFS API, is it going to disturb Impala ?

1
Is your impala table an external table ? - K S Nidhin
It could be external or internal. I have the choice. - Comencau

1 Answers

2
votes

Create table for your text files:

create external table stg_table (...) location '<your text file in hdfs>';

After external data change you have to refresh it:

refresh stg_table;

Then insert into you target table

insert overwrite table target_table select * from stg_table;

If your target table is partitioned, do this:

insert overwrite table target_table partiton(<partition spec>) select * from stg_table;

keyword 'overwrite' does the trick, it overwrites table or partition.