0
votes

At work we receive a new file each day that we transfer to hive. the table is partitioned by the column "day" containing the time the data was transferred to hive in the format of "yyyy-MM-dd". Whenever a new file is added, we want to drop the previous partitions so that the table only contains the latest file's data. Is there a way to write a HiveQL query to dynamically drop any partition before the current date? I tried :

alter table table_name drop if exists partition (day < current_date);

but I got this error:

FAILED: ClassCastException org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc cannot be cast to org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc."

Is there another way to do this?

1
If you don't want the previous days' data, why is the table partitioned like that?Andrew
The idea was to make it easier to delete the older data by simply dropping the partition. If there is another way to delete the older data, then that will work too.tclaw46
Have a script or so that send yesterday's value to the the above alter table drop partition command and use = as < or > doesnt work in hiveK S Nidhin
@KSNidhin, you are wrong about the relational operators.David דודו Markovitz
@DuduMarkovitz , Sorry my bad . Thanks for pointing out on the relational operators.K S Nidhin

1 Answers

1
votes

demo

hive

hive> create table mytable (i int) partitioned by (day date);
OK
hive> alter table mytable add partition (day=date '2017-08-29') partition (day=date '2017-08-30') partition (day=date '2017-08-31') partition (day=date '2017-09-01');
OK
hive> show partitions mytable;
OK
partition
day=2017-08-29
day=2017-08-30
day=2017-08-31
day=2017-09-01

bash

date +"%Y-%m-%d"
2017-09-01

hive --hivevar today=$(date +"%Y-%m-%d") -e \
> 'alter table local_db.mytable drop partition (day<date '\''${hivevar:today}'\'')'
Dropped the partition day=2017-08-29
Dropped the partition day=2017-08-30
Dropped the partition day=2017-08-31
OK

hive

hive> show partitions mytable;
OK
partition
day=2017-09-01