2
votes

I am new to Azure data lake analytics, I am trying to load a csv which is double quoted for sting and there are quotes inside a column on some random rows.

For example

ID, BookName
1, "Life of Pi"
2, "Story about "Mr X""

When I try loading, it fails on second record and throwing an error message. 1, I wonder if there is a way to fix this in csv file, unfortunatly we cannot extract new from source as these are log files?
2, is it possible to let ADLA to ignore the bad rows and proceed with rest of the records?

Execution failed with error '1_SV1_Extract Error : '{"diagnosticCode":195887146,"severity":"Error","component":"RUNTIME","source":"User","errorId":"E_RUNTIME_USER_EXTRACT_ROW_ERROR","message":"Error occurred while extracting row after processing 9045 record(s) in the vertex' input split. Column index: 9, column name: 'instancename'.","description":"","resolution":"","helpLink":"","details":"","internalDiagnostics":"","innerError":{"diagnosticCode":195887144,"severity":"Error","component":"RUNTIME","source":"User","errorId":"E_RUNTIME_USER_EXTRACT_EXTRACT_INVALID_CHARACTER_AFTER_QUOTED_FIELD","message":"Invalid character following the ending quote character in a quoted field.","description":"Invalid character is detected following the ending quote character in a quoted field. A column delimiter, row delimiter or EOF is expected.\nThis error can occur if double-quotes within the field are not correctly escaped as two double-quotes.","resolution":"Column should be fully surrounded with double-quotes and double-quotes within the field escaped as two double-quotes."

1

1 Answers

1
votes

As per the error message, if you are importing a quoted csv, which has quotes within some of the columns, then these need to be escaped as two double-quotes. In your particular example, you second row needs to be:

..."Life after death and ""good death"" models - a qualitative study",...

So one option is to fix up the original file on output. If you are not able to do this, then you can import all the columns as one column, use RegEx to fix up the quotes and output the file again, eg

// Import records as one row then use RegEx to clean columns
@input =
    EXTRACT oneCol string
    FROM "/input/input132.csv"
    USING Extractors.Text( '|', quoting: false );


// Fix up the quotes using RegEx
@output =
    SELECT Regex.Replace(oneCol, "([^,])\"([^,])", "$1\"\"$2") AS cleanCol
    FROM @input;    

OUTPUT @output
TO "/output/output.csv"
USING Outputters.Csv(quoting : false);

The file will now import successfully. My results:

My results