0
votes

I am trying to ingest a few CSV files from Azure Data Lake into Azure Synapse using Polybase. There is a fixed set of columns in each CSV file and the column names are given on the first line. However, the columns can come in different ordering sequence.

In Polybase, I need to declare external table which I need to know the exact sequence of columns during design time and hence I cannot create the external table. Are there other ways to ingest the CSV file?

1

1 Answers

1
votes

I don't believe you can do this directly with Polybase because as you noted the CREATE EXTERNAL TABLE statement requires the column declarations. At runtime, the CSV data is then mapped to those column names.

You could accomplish this easily with Azure Data Factory and Data Flow (which uses Polybase under the covers to move the data to Synapse) by allowing the Data Flow to generate the table. This works because the table is generated after the data has been read rather than before as with EXTERNAL.

For the sink Data Set, create it with parameterized table name [and optionally schema]: enter image description here

In the Sink activity, specify "Recreate table": enter image description here

Pass the desired table name to the sink Data Set from the Pipeline: enter image description here

Be aware that all string-based columns will be defined as VARCHAR(MAX).