3
votes

We are running a Delta lake on ADLS Gen2 with plenty of tables and Spark jobs. The Spark jobs are running in Databricks and we mounted the ADLS containers into DBFS (abfss://delta@<our-adls-account>.dfs.core.windows.net/silver). There's one container for each "tier", so bronze, silver, gold.

This setup has been stable for some months now, but last week, we've seen a sudden increase in transactions within our storage account, particularly in the ListFilesystemDir operations:

enter image description here

We've added some smaller jobs that read and write some data in that time frame, but turning them off did not reduce the amount of transactions back to the old level.

Two questions regarding this:

  1. Is there some sort of documentation that explains which operation on a Delta table causes which kind of ADLS transactions?
  2. Is it possible to find out which container/directory/Spark job/... causes this amount of transactions, without turning off the Spark jobs one by one?
1
are you using Structured Streaming for your jobs?Alex Ott
yes, structured streaming mostly, but there are also some batch jobshbrgnr
what triggers are you using on the streaming jobs?Alex Ott
you mean ".trigger(Trigger.ProcessingTime("1 minute"))" for example? none, mostly, but that's because the batch duration is generally quite large (>5 minutes per batch)hbrgnr

1 Answers

1
votes

If you go into logs from your data lake (if you have log analytics enabled) you can view the exact timestamp, caller and target of the spike. Take that data and go into your databricks cluster and navigate to Spark UI. In there you should be able to see timestamps and jobs. There you can find what notebook is causing it.