2
votes

I am fairly new to U-SQL and trying to run a U-SQL script in Azure Data Lake Analytics to process a parquet file using the Parquet extractor functionality. I am getting the below error and I don't find a way to get around it. Error - Change the identifier to use at least one lower case letter. If that is not possible, then escape that identifier (for example: '[ACTIVITY]'), or embed it in a CSHARP() block (e.g CSHARP(ACTIVITY)).

Unfortunately all the different fields generated in the Parquet file are capitalized and I don't want to to escape these identifiers. I have tried if I could wrap the identifier with CSHARP block and it fails as well (E_CSC_USER_RESERVEDKEYWORDASIDENTIFIER: Reserved keyword CSHARP is used as an identifier.) Is there anyway I could extract the parquet file? Thanks for your help! Code Snippet:

SET @@FeaturePreviews = "EnableParquetUdos:on";

@var1 = EXTRACT ACTIVITY string, AUTHOR_NAME string, AFFLIATION string

FROM "adl://xxx.azuredatalakestore.net/Abstracts/FY2018_028"
USING Extractors.Parquet();

@var2 = SELECT * FROM @var1 ORDER BY ACTIVITY ASC FETCH 5 ROWS;
OUTPUT @var2
TO "adl://xxx.azuredatalakestore.net/Results/AbstractsResults.csv" USING Outputters.Csv();

1
Can you post a clear, minimal, complete and verifiable example please? I'm thinking a small U-SQL script and sample Parquet file which succinctly explains the problem. That will be really useful and I'm sure someone will help you.wBob
Yup, I should have done that. I added the snippet to the question.Satya Azure

1 Answers

3
votes

Based on your description you try to say

EXTRACT ALLCAPSNAME int FROM "/data.parquet" USING Extractors.Parquet();

In U-SQL, we reserve all caps identifiers so we can add new keywords in the future without invalidating old scripts.

To work around, you just have to quote the name (escape it) like in any other SQL dialect:

EXTRACT [ALLCAPSNAME] int FROM "/data.parquet" USING Extractors.Parquet();

Note that this is not changing the name of the field. It is just the syntactic way to address the field.

Also note, that in most SQL communities, it is considered a best practice to always quote identifiers to avoid reserved keyword clashes.

If all fields in the Parquet file are all caps, you will have to quote them all... In a future update you will be able to say EXTRACT * FROM … for Parquet (and Orc) files, but you still will need to quote the columns when you refer to them explicitly.