0
votes

I'm doing a data load on azure sql server using azure data factory v2. I started the data load & the DB was set to Standard Pricing Tier with 800 DTUs. It was slow, so I increased the DTUs to 1600. (My pipeline is still running since 7 hrs).

I decided to change the pricing tier. I changed the pricing tier to Premium, DTUs set to 1000. (I didnt make any additional changes).

The pipeline failed as it lost connection. I rerun the pipeline.

Now, when I monitor the pipeline, it is working fine. When I monitor the database. The DTU usage on average is not going above 56%.

I am dealing with tremendous data. How can I speed up the process?

I expect the DTUs must max out. But the average utilization is around 56%.

2

2 Answers

1
votes

Please follow this document Copy activity performance and scalability guide.

This tutorial gives us the Performance tuning steps.

One of ways is increase the Azure SQL Database tier with more DTUs. You have increased the Azure SQL Database tier with more 1000 DTUs, but the average utilization is around 56%. I think You don't need so higher price tier.

You need to think about other ways to improve the performance. Such as set more Data Integration Units(DIU).

A Data Integration Unit is a measure that represents the power (a combination of CPU, memory, and network resource allocation) of a single unit in Azure Data Factory. Data Integration Unit only applies to Azure integration runtime, but not self-hosted integration runtime.

Hope this helps.

1
votes

The standard answer from Microsoft seems to be that you need to tune the target database or scale up to a higher tier. This suggests that Azure Data Factory is not a limiting factor in the copy performance.

However we've done some testing on a single table, single copy activity, ~15 GB of data. The table did not contain varchar(max), high precision, just simple and plain data.

Conclusion: it does barely matter what kind of tier you choose (not too low ofcourse), roughly above S7 / 800 DTU, 8 vcores, the performance of the copy activity is ~10 MB/s and does not go up. The load on the target database is 50%-75%.

Our assumption is that since we could keep throwing higher database tiers against this problem, but did not see any improvement in the copy activity performance, this is Azure Data Factory related.

Our solution is, since we are loading a lot of separate tables, to scale out instead of scale up via a for each loop and a batch count set to at least 4.

The approach to increase the DIU is only applicable in some cases: https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-performance#data-integration-units

Setting of DIUs larger than four currently applies only when you copy multiple files from Azure Storage, Azure Data Lake Storage, Amazon S3, Google Cloud Storage, cloud FTP, or cloud SFTP to any other cloud data stores.

In our case we are copying data from relational databases.