2
votes

I am working with Azure Data Factory and its new Data Flows feature. This is a GUI that is supposed to use Databricks to do data transformation, without writing any code.

All good so far. I have some examples working. My input data (from Azure Blob) is correctly transformed and joined to create the output (in Azure SQL).

The problem is that I have no Databricks resource. I deleted it. I also removed the Data Factory to Databricks connector. But I am still getting the right answers!

I suspect that my input sets are too small, or my transformations are too simple, so Data Factory is just handling them internally and knows it does not need the power of Databricks. But what do I have to do to force Data Factory to utilize Databricks? I want to test some things about that operation.

Another possibility is that Data Factory is using Databricks, but is doing so with its own Databricks resource rather than the users...??

1

1 Answers

3
votes

Azure Data Factory Data Flows always runs on Databricks behind-the-scenes. There is no way you can force (or disable) the use of Databricks.

In the early private preview, you had to configure and bring your own Databricks cluster. It was later changed, and as of May 2019, Azure Data Factory will manage the cluster for you.

(I have heard that they are planning to re-implement the bring-your-own-cluster feature at some point, but I haven't seen that confirmed publicly.)

If you turn on Data Flow Debug Mode or execute a pipeline with a Data Flow task, you will be billed for the cluster usage per vCore-hour. You can find all the details in the Data Pipeline Pricing and FAQ.