I have Hive table partitioned based on date yyyy-mm-dd. I want to run a script everyday that can delete all the partitions older than 30 days but in this case I don't want to pass the date. I want it to take system date and delete the partitions older than 30 days.
1
votes
Is your hive table managed or external table?
– Ramesh
What do you use to run your hive scripts? UNIX, ADF etc?
– Ramesh
It's managed orc table and I have written scripts in hql. I can use linux or unix scripts that is not a problem. Can you help me out
– Priyanka
Please share what attempts you've done already
– OneCricketeer
Tried one script but it deletes all the partition older than current date. I tried to give pass the value 30 but it throws error -> hive --hivevar today=$(date +"%Y-%m-%d") -e \ > 'alter table local_db.mytable drop partition (day<date '\''${hivevar:today}'\'')'
– Priyanka
1 Answers
5
votes
You need to use Linux/ Unix to set the variable for the DROP PARTITION date and use it in the ALTER TABLE statement.
Here is an example
CREATE TABLE ramesh.test
(col1 STRING,
col2 STRING)
PARTITIONED BY (partition_date date);
INSERT INTO TABLE ramesh.test PARTITION (partition_date='2017-10-01') VALUES ('key1', 'val1');
INSERT INTO TABLE ramesh.test PARTITION (partition_date='2017-10-02') VALUES ('key2', 'val2');
INSERT INTO TABLE ramesh.test PARTITION (partition_date='2017-11-01') VALUES ('key3', 'val3');
hive --hivevar var_drop_date="$(date -d "30 days ago" +"%Y-%m-%d")" -e 'ALTER TABLE ramesh.test DROP IF EXISTS PARTITION (partition_date <= date "${hivevar:var_drop_date}")'
This drops the 2 partitions from 30 days ago and leaves the one within 30 days.
Let me know whether it works for you. I used CDH for testing this.