2
votes

Trying to load csv files in the data lake(gen2) to Azure Synapse by using Azure Data Factory. The source file has "(double quote) as an escape character. This falls outside the data limitations of directly connecting polybase to Data Lake. I setup the staged copy by the following the documentation

"enableStaging": true,
  "stagingSettings": {
                                "linkedServiceName": {
                                    "referenceName": "LS_StagedCopy",
                                    "type": "LinkedServiceReference"
                                },
                                "path": "myContainer/myPath",
                                "enableCompression": false
                            }

After I debug the pipeline, I am still getting

{Class=16,Number=107090,State=1,Message=HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: HadoopExecutionException: Too many columns in the line.,},],

I do see ADF creating a temporary folder in the path I supplied in the staged copy, but it looks like it not performing the required transformation to load data. Am I missing anything?

Link to doc Copy and transform data in Azure SQL Data Warehouse by using Azure Data Factory

1

1 Answers

0
votes

Most likely the problem is your data. Check your delimiter. Hope its not "," or something obvious like this. Its a common problem when one column has a text with many "," ADF will interpret it as a new column. Test it with a smaller clean csv and go from there.