2
votes

Working within Azure Data Factory, using the built-in Oracle Connector...

Given a very simple query, such as:

SELECT Col001, Col002, Col003 FROM APPS.WHATEVER_TABLE;

This type of query, with around 30 columns, can stream 1,000,000 rows to Toad on a tiny VM in less than 60 seconds. From the exact same Oracle server, from within Azure Data Factory's Self-Hosted Integration Runtime, this query takes over 8 minutes, with frequent pauses/hangs.

The CPU in the IR box runs at around 30% during this time. The free memory on the IR box stays at or above 5GB during this time. This performs the same, regardless of the DTU level of the Azure SQL Database Sink. Today I tried this between 800 DTU and 3,000 DTU and got the exact same performance, with Log I/O on the Azure SQL Database staying at or under 10%.

The documentation for the ADF Oracle Connector does not help in this at all, as it does not give any guidance for how to tweak connection string parameters, or really whether or not you can even do so.

Thoughts?

1
I’m have similar performance issues. Did you resolve this? - Andy Bradbrook
Hi! Yes, I should have updated this question, but somehow forgot. Check out my answer to this one: stackoverflow.com/questions/53000355/… and I will update this one as well. - Pittsburgh DBA

1 Answers

0
votes

Resolution:

We began to suspect that something was amiss with data types, because the problem disappeared if we cast all of our high-precision Oracle NUMBER columns to less precision, or to something like integer.

It got so bad that we opened a case with Microsoft about it, and our worst fears were confirmed.

The Azure Data Factory runtime decimal type has a maximum precision of 28. If a decimal/numeric value from the source has a higher precision, ADF will first cast it to a string. The performance of the string casting code is abysmal.

Check to see if your source has any high-precision numeric data, or if you have not explicitly defined schema, see if you're perhaps accidentally using string.