3
votes

I am running a U-SQL Activity as part of a Pipeline in Azure Data Factory for a defined time slice. The U-SQL Activity runs a sequence of U-SQL scripts that read-in and process data stored in Azure Data Lake. While the data processes successfully in my local run it is throwing an System Out of Memory Exception when running in Azure Data Factory Cloud Environment.

The Input data is approximately 200MB, which should not be a problem processing, as bigger data sets have been processed previously.

Memory management is assumed to scale as needed, it is surprising to see an Out of Memory Exception in a Azure Cloud Environment, following are Exception snapshots of two runs on the same input data, the only difference being the time at which they occur.

Exception Snapshot - 1

Exception Snapshot - 2

Any assistance is highly appreciated, thanks.

Further Update: On further investigation it was observed skipping header row using variable skipNRow:1 re-solved the issue, our u-sql code behind snippet has a loop which is conditioned on a date comparison, its possible the loop isn't terminating because of an invalid date time cast of header row column given the snippet is processing DateTime type row column as input. That should ideally give an invalid date time format exception but we see an Out of memory exception instead.

1
Ouch, so your loop does not have infinite loop protection? eg pseudo-code if loopCount > 99 breakwBob

1 Answers

1
votes

It looks like something in the user code is causing the exception you can try running the failed vertex debug feature in VS. you can open the failed job in VS and it should give you an error bar in the job overview that lets you kick off that process. It will download the failed portion to the desktop and let you step through.