0
votes

I have a csv file in my ADLS:

a,A,b
1,1,1
2,2,2
3,3,3

When I load this data into a delimited text Dataset in ADF with first row as header the data preview seems correct (see picture below). The schema has the names a, A and b for columns.

enter image description here

However, now I want to use this dataset in Mapping Data Flow and here does the Data Preview mode break. The second column name (A) is seen as duplicate and no preview can be loaded. enter image description here enter image description here

All other functionality in Data Flow keeps on working fine, it is only the Data Preview tab that gives an error. All consequent transformation nodes also gives this error in the Data Preview.

Moreover, if the data contains two "exact" same column names (e.g. a, a, b), then the Dataset recognizes the columns as duplicates and puts a "1" and "2" after each name. It is only when they are case-sensitive unequal and case-insensitive equal that the Dataset doesn't get an error and Data Flow does.

Is this a known error? Is it possible to change a specific column name in the dataset before loading into Data Flow? Or is there just something I'am missing?

1

1 Answers

1
votes

I testes it and get the error in source Data Preview: enter image description here

I ask Azure support for help and they are testing now. Please wait my update.

Update:

I sent Azure Support the test.csv file. They tested and replied me. If you insist to use " first row as header", Data Factory can not solve the error. The solution is that re-edit the csv file. Even in Azure SQL database, it doesn't support we create a table with same column name. Column names are case-insensitive.

For example, this code is not supported: enter image description here

Here's the full email message:

Hi Leon,

Good morning! Thanks for your information.

I have tested the sample file you share with me and reproduce the issue. The data preview is alright by default when I connect to your sample file. enter image description here But I noticed when we do the trouble shooting session – a, A, b are column name, so you have checked the first row as header your source connection. Please confirm it’s alright and you want to use a, A, b as column headers. If so, it should be a error because there’s no text- transform for “A” in schema. enter image description here Hope you can understand the column name doesn’t influence the data transform and it’s okay to change it to make sure no errors block the data flow. There’re two tips for you to remove block, one is change the column name from your source csv directly or you can click the import schema button ( in the blow screen shot) in schema tab, and you can choose a sample file to redefine the schema which also allows you to change column name. enter image description here

Hope this helps.