2
votes

I am running USQL job in Azure data lake analytics from Visual Studio and getting below script error. Purpose of my job is to read xml file from Azure blob using Azure blob REST api and extract data and then produce csv file in azure data lake store. I don't see any help in error. Can anyone help my to understand this issue?

DIAGNOSTICCODE: 223412289

SEVERITY: Error

COMPONENT: JobManager_User

SOURCE: User

ERRORID: VertexRetriedTooMany

MESSAGE: Vertex retried too many times

DESCRIPTION: Vertex SV1_Extract[0][0] retried 24 times.

RESOLUTION: N/A

HELPLINK: N/A

DETAILS: Vertex SV1_Extract[0][0].v23 {B0AF5C27-21A5-4011-8044-09A4AB0642C4} failed Error: Incorrect function.

UPDATE - More information about my use case:

I am trying to use "custom user defined operators" in my USQL job because I think my use case can easily be solved using this feature.

My input CSV file is placed in data lake store that contains some values and paths for XML files placed on Azure blob.

In USQL job, I am reading XML file paths from CSV(using USQL) and then reading those XML files from Azure blob storage and extracting values (using code behind c#) and merging my input file with XML values and producing new CSV file in Azure data lake store(Again using USQL).

Update 2

I also tried to use Windows Azure storage sdk insted of REST API for accessing blob in code behind and got following error on running job:

  "errorId": "E_RUNTIME_USER_UNHANDLED_EXCEPTION_FROM_USER_CODE",
  "message": "An unhandled exception from user code has been reported",
  "description": "Unhandled exception from user code: \"The remote name could not be resolved: 'xxxxx.blob.core.windows.net'\"\nThe details includes more information including any inner exceptions and the stack trace where the exception was raised.",
  "resolution": "Make sure the bug in the user code is fixed.",
  "helpLink": "",
  "details": "==== Caught exception Microsoft.WindowsAzure.Storage.StorageException\n\n   at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.ExecuteSync[T](RESTCommand`1 cmd, IRetryPolicy policy, OperationContext operationContext)\r\n\n   at Microsoft.WindowsAzure.Storage.Blob.CloudBlockBlob.DownloadRangeToStream(Stream target, Nullable`1 offset, Nullable`1 length, AccessCondition accessCondition, BlobRequestOptions options, OperationContext operationContext)\r\n\n   at USQLAppForLogs.LogTable.GetValuesFromBlob(String bloburi)\r\n\n   at USQLAppForLogs.LogTable.Process(IRow input, IUpdatableRow output)\r\n\n   at ScopeEngine.SqlIpProcessor<Extract_0_Data0,SV1_Extract_out0>.GetNextRow(SqlIpProcessor<Extract_0_Data0\\,SV1_Extract_out0>* , SV1_Extract_out0* output) in d:\\data\\ccs\\jobs\\f030ffdf-fc4a-4780-aec5-9067dde49e85_v0\\sqlmanaged.h:line 1821\r\n\n   at RunAndHandleClrExceptions(function<void __cdecl(void)>* code)\n\n==== Inner exception System.Net.WebException\n\nThe remote name could not be resolved: 'xxxxx.blob.core.windows.net'\n\n   at System.Net.HttpWebRequest.GetResponse()\r\n\n   at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.ExecuteSync[T](RESTCommand`1 cmd, IRetryPolicy policy, OperationContext operationContext)"

Note that same code works fine locally so I don't think there is any issue in code where I am accessing blob.

2

2 Answers

2
votes

This error message normally occurs if the vertex fails due to some system or user error. In this case the error message is not very helpful (Incorrect function).

How are you reading the XML file? You mention that you are using the Azure Blob REST API. That is probably the cause.

If you want to read a file from Windows Azure Blob Stores, you can register the store with your ADLA account (eg., through the Azure Portal, in the ADLA account you can add more stores). Then you can use the wasb URI scheme. An example is here: https://github.com/MicrosoftBigData/usql/blob/master/Examples/AmbulanceDemos/AmbulanceDemos/1-Ambulance-Unstructured%20Data/1.2-CopyDriversFromWASBToADL.usql

You then can use for example the XML extractors in our XML/JSON sample libraries available here: https://github.com/MicrosoftBigData/usql/tree/master/Examples/DataFormats

Feel free to send me feedback on the sample once you used it.

If this does not address your issue, please let me know.

2
votes

Answering the updated information.

The reason your code works locally is that the local execution does not currently impose the external call restrictions that the service's YARN layer currently imposes.

The YARN layer does not allow your code to reach out via http or REST calls. The containers are not allowed to access external resources for security reasons.

So my suggestion is to do one of two things (both require that you register the blob store account as an additional data source):

  1. Write a script-generating script (using U-SQL, Powershell, Python, or your favorite script generation language) that will use the EXTRACT on the wasb: URI for your blob store data.

  2. If the files have the same schema and are organized according to some path pattern, you can also use file set patterns to refer to a set of files where you don't know the exact filename apriori.

Note that U-SQL currently expects to be able to resolve all file names at compile time.

I will file some bugs on the useless error messages though. And if you like to request a feature that gives you more flexibility in reading files, I encourage you to head to http://aka.ms/adlfeedback to file a request with a use case scenario. That way others can give your suggestion their vote and it helps us in prioritizing the feature in our planning.