0
votes

I'm trying to vacuum my Delta tables in Databricks. However, somehow it is not working and I don't understand why. This is causing our storage constantly increasing.

I have set the following table properties:

%sql
ALTER TABLE <table_name> SET TBLPROPERTIES 
("delta.deletedFileRetentionDuration" = "interval 2 hours");

%sql
ALTER TABLE <table_name> SET TBLPROPERTIES 
("delta.logRetentionDuration" = "interval 2 hours");

Then I run the following vacuum command in a Databricks notebook:

%sql
VACUUM db_name.table_name retain 2 hours

or like this:

%sql
VACUUM db_name.table_name

The files that show up in the dbfs as candidate for removal are still there after running this command.

Example of the data in the delta_log json:

{"remove":{"path":"year=2021/month=05/day=06/part-00001-52dd3cf7-9afc-46b0-9a03-7be3d1ee533e.c000.snappy.parquet","deletionTimestamp":1622798688231,"dataChange":true}

I also added some data and deleted some data for testing purposes because I read that you need to alter the table before the vacuum can be executed successfully.

What am I missing here?

1

1 Answers

0
votes

Try checkpointRetentionDuration as well . ALTER TABLE table-name

SET TBLPROPERTIES ('delta.checkpointRetentionDuration' = '7 days')