We have Azure data lake storing data in parquet files in delta lake format. After every run, where new data is merged, we call vacuum with 0 hour retention to remove old files and run optimize command.
But for some reason, old files are not being deleted. No errors in the databricks notebook though. It says 2 files removed, but i still see them. Am i missing something obvious? Thanks!
sqlContext.sql(f"VACUUM '{adls_location}' RETAIN 0 HOURS")
time.sleep(60)
sqlContext.sql(f"VACUUM '{adls_location}' RETAIN 0 HOURS")
time.sleep(60)
sqlContext.sql(f"OPTIMIZE '{adls_location}'")