1
votes

We are using an Azure Storage Account (Blob, StorageV2) with a single container in it. We are also using Azure Data Factory to trigger data copy pipelines from blobs (.tar.gz) created in the container. The trigger works fine when creating the blobs from an Azure App Service or by manually uploading via the Azure Storage Explorer. But when creating the blob from a Notebook on Azure Databricks, we get two (2) events for every blob created (same parameters for both events). The code for creating the blob from the notebook resembles:

dbutils.fs.cp(
  "/mnt/data/tmp/file.tar.gz", 
  "/mnt/data/out/file.tar.gz"
)

The tmp folder is just used to assemble the package, and the event trigger is attached to the out folder. We also tried with dbutils.fs.mv, but same result. The trigger rules in Azure Data Factory are:

Blob path begins with: out/

Blob path ends with: .tar.gz

The container name is data.

We did find some similar posts relating to zero-length files, but at least we can't see them anywhere (if some kind of by-product to dbutils).

As mentioned, just manually uploading file.tar.gz works fine - a single event is triggered.

1
Having the same issue, I've submitted a ticket to Microsoft in regards to the issue and starting to go down the rabbit hole of using Azure's Java libraries which seem to be a mess right now.David Nguyen

1 Answers

1
votes

We had to revert to uploading the files from Databricks to the Blob Storage using the azure-storage-blob library. Kind of a bummer, but it works now as expected. Just in case anyone else runs into this.

More information:

https://docs.microsoft.com/en-gb/azure/storage/blobs/storage-quickstart-blobs-python