I am implementing one testing solution as:
I have created an Azure databricks notebook in Python. This notebook is performing following tasks (for testing)-
- Read blob file from Storage account in a Pyspark dataframe.
- Doing some transformation and analysis on it.
- Creating CSV with transformed data and storing in a different container.
- Move original read CSV to different archive container (so that it should not be picked up in next execution).
*Above steps can be done in different Notebooks also.
Now, I need this Notebook to be triggered for each new Blob in a container. I will implement following orchestration-
New blob in Container -> event to EventGrid topic-> trigger Datafactory pipeline -> execute Databricks Notebook.
We can pass filename as parameter from ADF pipeline to Databricks notebook.
Looking for some other ways to do the orchestration flow. If above seems correct and more suitable, please mark as answered.