1
votes

I had a small confusion on transactional log of Delta lake. In the documentation it is mentioned that by default retention policy is 30 days and can be modified by property -: delta.logRetentionDuration=interval-string . But I don't understand when the actual log files are deleted from the delta_log folder. Is it when we run some operation? Or may be VACCUM operation. However, it is mentioned that VACCUM operation only deletes data files and not logs. But will it delete logs older than specified log retention duration?

reference -: https://docs.databricks.com/delta/delta-batch.html#data-retention

1
Just to add on it, how can we set retention of Delta log transaction data forever? - Anish Sarangi

1 Answers

0
votes

The concept around data retention is to establish policies that ensure that data that cannot be retained should be automatically removed as part of the process. By default, DeltaLake stores a change data capture history of all data modifications. There are two settings delta.logRetentionDuration (default interval 30 days) and delta.deletedFileRetentionDuration (default interval 1 week).

Delta_log is the default implementation of transaction log in Databricks Delta Lake. It keeps the commit history of table transactions for default period of 30 days. However, if you are ingesting data in delta lake tables quite frequently, you may see so many tiny json and crc files created in your storage account under _delta_log directory. This can potentially increase your storage costs if you are not interested in maintaining log history for 30 days.

If you are administering and managing your databricks environment, You should be looking into truncating these log files. The default way to do this in databricks delta lake is to run ALTER TABLE.. TBLPROPERTIES statement for each table and it may sound cumbersome for administration.

%sql
ALTER table_name SET TBLPROPERTIES ('delta.logRetentionDuration'='interval 240 hours', 'delta.deletedFileRetentionDuration'='interval 1 hours')
SHOW TBLPROPERTIES table_name