0
votes

I created 2 external tables Hive. In first table specified data location with create statement. In second table loaded data after creating it. I can see data file created for second table in /hive/warehouse/ directory. Then I set "external.table.purge"="true" for both tables. And DROP both tables. But data files of both tables remains as is.

  1. What is the behaviour of 'external.table.purge'='true'. Shouldn't it delete data files as well on issuing Drop command?
  2. If Hive does not take any ownership over data files of external table, why is there even an option as 'external.table.purge'='true'.

I read in one of the threads, where someone mentioned it is possible to delete data as well for external tables by ALTER TABLE ... SET TBLPROPERTIES('external.table.purge'='true'), but unable to find that post again.

Syntax used to Create External Table.

Code used to set purge option.

2

2 Answers

1
votes

You can not drop the data in external table but you can do it for internal(managed) tables. So convert the table to internal and then drop it.
First change eternal property to false.
hive> ALTER TABLE nyse_external SET TBLPROPERTIES('EXTERNAL'='False');
and then you can easily drop it.
hive> drop table nyse_external;

TBLPROPERTIES ("external.table.purge"="true") should work for hive version 4.x+.

1
votes

Answer to point 1: Table property "external.table.purge", which if true (and if the table is an external table), will let Hive know to delete the table data when the table is dropped. This feature is introduced in this apache jira. https://issues.apache.org/jira/browse/HIVE-19981 .

For reference on how to set the property take a look at this example, https://docs.cloudera.com/runtime/7.2.7/using-hiveql/topics/hive_drop_external_table_data.html