How to run a python script in databricks on Azure datalake delta data

Question

I have a python script written in azure databricks for doing ETL on the raw text files in ".txt" format and having no schema stored in Azure datalake V2. I migrated these text files from an on-premises virtual machine using data factory. My requirement is to run the python script only on new data (delta data) migrated into Azure datalake. How can I achieve it?

1_1 1_1 · Accepted Answer · 2020-11-30T03:32:59

You can try to use azure function blob trigger or to use event grid trigger and set the 'blob created' as the condition. Then put the ETL logic in the body of the function.

This is the offcial doc:

https://docs.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob-trigger?tabs=csharp

How to run a python script in databricks on Azure datalake delta data

1 Answers