I am running a U-SQL Activity as part of a Pipeline in Azure Data Factory for a defined time slice. The U-SQL Activity runs a sequence of U-SQL scripts that read-in and process data stored in Azure Data Lake. While the data processes successfully in my local run it is throwing an System Out of Memory Exception when running in Azure Data Factory Cloud Environment.
The Input data is approximately 200MB, which should not be a problem processing, as bigger data sets have been processed previously.
Memory management is assumed to scale as needed, it is surprising to see an Out of Memory Exception in a Azure Cloud Environment, following are Exception snapshots of two runs on the same input data, the only difference being the time at which they occur.
Any assistance is highly appreciated, thanks.
Further Update: On further investigation it was observed skipping header row using variable skipNRow:1 re-solved the issue, our u-sql code behind snippet has a loop which is conditioned on a date comparison, its possible the loop isn't terminating because of an invalid date time cast of header row column given the snippet is processing DateTime type row column as input. That should ideally give an invalid date time format exception but we see an Out of memory exception instead.
if loopCount > 99 break
– wBob