vacuum not deleting old parquet files

Question

We have Azure data lake storing data in parquet files in delta lake format. After every run, where new data is merged, we call vacuum with 0 hour retention to remove old files and run optimize command.

But for some reason, old files are not being deleted. No errors in the databricks notebook though. It says 2 files removed, but i still see them. Am i missing something obvious? Thanks!

sqlContext.sql(f"VACUUM  '{adls_location}' RETAIN 0 HOURS")
time.sleep(60)
sqlContext.sql(f"VACUUM  '{adls_location}' RETAIN 0 HOURS")
time.sleep(60)
sqlContext.sql(f"OPTIMIZE '{adls_location}'")

CHEEKATLAPRADEEP-MSFT CHEEKATLAPRADEEP-MSFT · Accepted Answer · 2020-06-08T10:38:16

You cannot use VACUUM directly on cloud storage. To vacuum storage, you must mount it to DBFS and run VACUUM on the mounted directory.

Reference: Azure Databricks - Vacuum

vacuum not deleting old parquet files

1 Answers