0
votes

Lets say I have a partitioned hive table

>show partitions db.my_table;
+----------------------------------+
|             partition            |
+----------------------------------+
|        in_date=20-09-2020        |
|        in_date=21-09-2020        |
|        in_date=22-09-2020        |
+----------------------------------+

If I manually delete the partition directories from HDFS by

$hadoop fs -rm -r 'path/to/table/in_date=20-09-2020';

but don't drop the partitions from hive table

Will it cause any real problem (apart for having orphaned partition names in the table meta data)?

WHAT I HAVE VERIFIED

The hive table queries work fine (for both external and internal tables)

>select * from db.my_table;   --works fine
>show partitions db.my_table  --shows orphaned partitions,not a real problem

EDIT: Aggergate functions queries like COUNT(),MAX() etc fail with error
        Input path does not exist: path/to/table/in_date=20-09-2020

Does anyone know whether this might cause some other problem/break some other application?

1

1 Answers

1
votes

On Tez it causing FileNotFound Exception because partition metadata exists and folder is absent. Drop partition as well: ALTER TABLE DROP PARTITION(in_date='20-09-2020')