0
votes

I am building pipelines on Azure Data Factory, using the Mapping Data Flow activity (Azure SQL DB to Synapse). The pipelines complete in debug mode, when I enable sampling data for the sources. When I disable sampling data and run the debug, I make no progress in the pipeline. i.e. none of the transformations complete (yellow dot)

To improve this, should I increase the batch size from the source/sink (how do I determine a batch size), increase the number of partitions (how do I determine a good number of partitions)

1

1 Answers

0
votes

What is the size of the Spark compute cluster you have set in the Azure Integration Runtime under data flow properties. Start there by creating an Azure IR with enough cores to provide RAM for your process. Then you can adjust the partitions and batch sizes. Much of the learnings in this area are shared here at this ADF data flow performance guide.