3
votes

I have Azure data lake analytics job that processes around 3.8 million records stored on Azure data lake store using U-SQL user defined operators.

On the first run, I set parallelism equal to 10 and on the second run I used parallelism equal to 1. Surprisingly, my job duration for both execution is same(around 1.5 hours). So it looks like parallelism is not triggered for my job. Is it because I used user defined operators? I am wondering how do I determine when parallelism will be triggered and when it will not?

1

1 Answers

4
votes

Did you use user-defined functions or a custom UDO?

User-defined functions should not impede parallelism. A custom UDO may, depending on its internals.

What do the job graph vertices say?

You can analyze the parallelization by looking at the job graph and if you download the profile, you can look at the vertex graph and use the Diagnostic tab to further drill into. Does the playback actually show parallel execution?

In general, the system should automatically parallelize your jobs based on the limit you specified, the size of the data and the complexity of the query operations and the statistics gathered and estimated by the query processor.