I'm transforming data within different Databricks notebooks (reading, transforming and writing to/from ADLS). I conected these notebooks within a DataFactory pipeline:
Notebook 1 --> Notebook 2 --> Notebook 3 --> Notebook
I've than created a connection to my Databricks from the DataFactory and added it to my notebook activities. I would like to start a Databricks cluster whenever the pipeline has been triggered. Overall all of this it working fine. But Databricks starts a job cluster for each notebook activity which takes too long and seems unnecessary to me.
Is it possible to start a cluster at the beginning of a pipeline and then shut it down after all notebooks has been completed? Or are there any arguments that it's good to have a job cluster for each activity?