We are running a Delta lake on ADLS Gen2 with plenty of tables and Spark jobs. The Spark jobs are running in Databricks and we mounted the ADLS containers into DBFS (abfss://delta@<our-adls-account>.dfs.core.windows.net/silver
). There's one container for each "tier", so bronze
, silver
, gold
.
This setup has been stable for some months now, but last week, we've seen a sudden increase in transactions within our storage account, particularly in the ListFilesystemDir
operations:
We've added some smaller jobs that read and write some data in that time frame, but turning them off did not reduce the amount of transactions back to the old level.
Two questions regarding this:
- Is there some sort of documentation that explains which operation on a Delta table causes which kind of ADLS transactions?
- Is it possible to find out which container/directory/Spark job/... causes this amount of transactions, without turning off the Spark jobs one by one?