0
votes

I am implementing one testing solution as:

I have created an Azure databricks notebook in Python. This notebook is performing following tasks (for testing)-

  1. Read blob file from Storage account in a Pyspark dataframe.
  2. Doing some transformation and analysis on it.
  3. Creating CSV with transformed data and storing in a different container.
  4. Move original read CSV to different archive container (so that it should not be picked up in next execution).

*Above steps can be done in different Notebooks also.

Now, I need this Notebook to be triggered for each new Blob in a container. I will implement following orchestration-

New blob in Container -> event to EventGrid topic-> trigger Datafactory pipeline -> execute Databricks Notebook.

We can pass filename as parameter from ADF pipeline to Databricks notebook.

Looking for some other ways to do the orchestration flow. If above seems correct and more suitable, please mark as answered.

2
This is a common pattern to do this, you should be fine.Daniel

2 Answers

0
votes

New blob in Container -> event to EventGrid topic-> trigger Datafactory pipeline -> execute Databricks Notebook.

We can pass filename as parameter from ADF pipeline to Databricks notebook.

Looking for some other ways to do the orchestration flow. If above seems correct and more suitable, please mark as answered.

You can use this method. Of course, you can also follow this path:

New blob in Container -> Use built-in event trigger to trigger Datafactory pipeline -> execute Databricks Notebook.

I don't think you need to introduce the event grid, because Data Factory comes with triggers for creating events based on blobs.

0
votes

I got 2 support comments for what I am following for orchestration. // New blob in Container -> event to EventGrid topic-> trigger Datafactory pipeline -> execute Databricks Notebook. //