I would like to know if there are any tips and tricks to find error in data lake analytics jobs. The error message seems most of the time to be not very detailed.
When trying to extract from CSV file I often get error like this
Vertex failure triggered quick job abort. Vertex failed: SV1_Extract[0] with >error: Vertex user code error.
Vertex failed with a fail-fast error
It seems that these error occur when trying to convert the columns to specified types.
The technique I found is to extract all columns to string and then do a SELECT that will try to convert the columns to the expected type. Doing that columns by columns can help find the specific column in error.
@data =
EXTRACT ClientID string,
SendID string,
FromName string,
FROM "wasb://..."
USING Extractors.Csv();
//convert some columns to INT, condition to skip header
@clean =
SELECT Int32.Parse(ClientID) AS ClientID,
Int32.Parse(SendID) AS SendID,
FromName,
FROM @data
WHERE !ClientID.StartsWith("ClientID");
Is it also possible to use something like a TryParse to return null or default values in case of a parsing error, instead of the whole job failing?
Thanks