0
votes

I have an Azure Data Factory with a pipeline that I'm using to pick up data from an on-premise database and copy to CosmosDB in the cloud. I'm using a data flow step at the end to delete documents that don't exist in the source from the sink.

I have 3 integration runtimes set up:

  • AutoResolveIntegrationRuntime (default set up by Azure)
  • Self hosted integration runtime (I set this up to connect to the on-premise database so it's used by the source dataset)

enter image description here

  • Data flow integration runtime (I set this up to be used by the data flow step with a TTL setting)

enter image description here

The issue I'm seeing is when I trigger the pipeline the AutoResolveIntegrationRuntime is the one being used so I'm not getting the optimisation that I need from the Data flow integration runtime with the TTL. Any thoughts on what might be going wrong here?

1
Hi @HEIdursi, If my answer is helpful for you, hope you can accept it as answer( click on the check mark beside the answer to toggle it from greyed out to filled in.). This can be beneficial to other community members. Thank you.Leon Yue

1 Answers

0
votes

Per my experience, only the AutoResolveIntegrationRuntime (default set up by Azure) supports the optimization: enter image description here

When we choose the data flow run on non-default integration, there isn't the optimization: enter image description here

And once the integration runtime created, we also couldn't change the settings: enter image description here

Data Factory documents didn't talk more about this. When I run the pipeline, I found that the dataflowruntime won't work: enter image description here

That means that no matter which integration runtime you used to connect to the dataset, data low will always use the Azure Default integration runtime.