1
votes

I would like to know if there are any tips and tricks to find error in data lake analytics jobs. The error message seems most of the time to be not very detailed.

When trying to extract from CSV file I often get error like this

Vertex failure triggered quick job abort. Vertex failed: SV1_Extract[0] with >error: Vertex user code error.

Vertex failed with a fail-fast error

It seems that these error occur when trying to convert the columns to specified types.

The technique I found is to extract all columns to string and then do a SELECT that will try to convert the columns to the expected type. Doing that columns by columns can help find the specific column in error.

@data =
    EXTRACT ClientID string,
            SendID string,
            FromName string,           
    FROM "wasb://..."
    USING Extractors.Csv();

//convert some columns to INT, condition to skip header
@clean =
    SELECT Int32.Parse(ClientID) AS ClientID,
           Int32.Parse(SendID) AS SendID,
           FromName,           
    FROM @data
    WHERE !ClientID.StartsWith("ClientID");

Is it also possible to use something like a TryParse to return null or default values in case of a parsing error, instead of the whole job failing?

Thanks

2

2 Answers

3
votes

Here is a solution without having to use code behind (although Codebehind will make your code a bit more readable):

SELECT ((Func<string, Int32?>)(v => { Int32 res; return Int32.TryParse(v, out res)? (Int32?) res : (Int32?) null; }))(ClientID) AS ClientID

Also, the problem you see regarding error message being cryptic has to do with a bug that should be fixed soon in returning so called inner error messages. The work around today is to do the following:

  1. In the ADL Tools for VisualStudio, open the Job View of the failed job.
  2. In the lower left corner, click on “resources” link in the job detail area.
  3. Once the job resources are loaded, click on “Profile”.
  4. Search for the string “jobError” at the beginning of the line. Copy the entire line of text and paste in notepad (or other text editor) to read the actual error.

That should give you the exact error message.

1
votes

Yes, you can use TryParse using U-SQL user defined functions. You can do this like:

In code behind:

namespace TestNS
{
    public class TestClass
    {
        public static int TryConvertToInt(string s)
        {
            int i = 0;
            if (Int32.TryParse(s, out i))
                return i;
            return 0;
        }
    }
}

In U-SQL Script:

TestNS.TestClass.TryConvertToInt(ClientID) AS clientID

It looks like you have some other issues, as I always get appropriate error in case of conversion problem, something like:

"E_RUNTIME_USER_EXTRACT_COLUMN_CONVERSION_INVALID_ERROR","message":"Invalid character when attempting to convert column data."