1
votes

I am unable to find a way to create an external table in Azure SQL Data Warehouse (Synapse SQL Pool) with Polybase where some fields contain embedded commas.

For a csv file with 4 columns as below:

myresourcename, 
myresourcelocation, 
"""resourceVersion"": ""windows"",""deployedBy"": ""john"",""project_name"": ""test_project""",
"{  ""ResourceType"": ""Network"",  ""programName"": ""v1""}" 

Tried with the following Create External Table statements.

CREATE EXTERNAL FILE FORMAT my_format
WITH (  
    FORMAT_TYPE = DELIMITEDTEXT,
    FORMAT_OPTIONS(
        FIELD_TERMINATOR=',',
        STRING_DELIMITER='"',
        First_Row = 2
    )    
); 
CREATE EXTERNAL TABLE my_external_table
(
resourceName VARCHAR,
resourceLocation    VARCHAR,
resourceTags    VARCHAR,
resourceDetails VARCHAR
)  
WITH (
    LOCATION = 'my/location/',
    DATA_SOURCE = my_source,  
    FILE_FORMAT = my_format
)

But querying this table gives the following error:

Failed to execute query. Error: HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: HadoopExecutionException: Too many columns in the line.

Any help will be appreciated.

1
Do you have to DELIMITEDTEXT format? I prefer to use parquet.snappy as the compression is better and data quality issues is almost non existent. Otherwise I consider that data as garbage and set my reject threshold up from 0 to 50 or something inline with that - JPVoogt

1 Answers

1
votes

Currently this is not supported in polybase, need to modify the input data accordingly to get it working.