I'm using the Self-Hosted Integration Runtime in Azure Data Factory to copy data from an On-Premises source (normal file system) to an Azure Blob Storage destination. After being transferred, I want to process the files automatically by attaching a Notebook running on a Databricks cluster. The pipeline works fine, but my question concerns the output of the Copy Activity.
Is there a way to get information on the transferred files and folders for each run? I would pass this information as parameters to the notebook.
Looking at the documentation, it seems only aggregated information is available:
https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-overview
Which kind of makes sense, if you transfer huge amounts of files. If not possible, I guess an alternate approach would be to just leave the copy process to itself, and create another pipeline based on storage account events? Or maybe store the new file and folder information for each run in a fixed text file, transfer it also, and read it in the notebook?