0
votes

I'm using Azure Data Factory for reading data from a Data Lake and storing the filtered data into Cosmos DB (Sql Api) in one container. I'm using Integration Runtime - Memory Optimized, 4 (+4 Driver Cores) in ADF and Autoscale 20,000 RU/s in Cosmos DB. The ADF pipeline takes ~20 min to complete the triggered run. I'm planning to run ADF pipeline once every day. On checking the monthly cost associated with ADF and Cosmos DB, I see that combined cost of ADF and Cosmos DB is ~$150. (ADF cost: ~70$ Cosmos DB: ~$70). Is there an alternate sink that I could use instead of Cosmos DB to reduce the cost. Note: I need to use a sink where I can query.

Here is the time spent in sink transformation: enter image description here enter image description here

1
Can you open the ADF data flow monitoring view to see where most of that 30 mins in being spent? Click on the Sink transformation in the monitoring graph and look at the right-hand panel. It will show you the number of mins/secs spent writing to CosmosDB. To lower that number, provide more RUs. But if most of the time is spent inside your transformation stages, then increase the number of cores from 8 to 16 or 32. Lastly, you can reduce your spend in ADF by using Compute Optimized instead of memory optimized. But that will increase the time spent inside the transformation stages. HTH!Mark Kromer MSFT
Thank you! Time spent in Sink Transformation is 14 min (Please see image in the above question).user989988
Your data flow is only spending about 2 mins in Spark w/transformation stages. The bulk of the time is CosmosDB I/O. If you are OK with your pipeline taking a little longer to complete, you can switch to a Compute Optimized Azure IR and it should lower your ADF costs.Mark Kromer MSFT

1 Answers

0
votes

Why use autoscale? ADF understands 429's so can apply back pressure and copy the data at a minimum 400 RU/s