0
votes

In ADLS Gen2, TextFiles folder has 3 CSV files. Column names are different in each file.

We need to convert all 3 CSV files to 3 parquet files and put it in ParquetFiles folder

I tried to use Copy Activity and it fails because the column names have empty space in it and parquet files doesn't allow it

To remove spaces, I used Data flow: Source -> Select (replace space by underscore in col name) and sink. This worked for a single file. When I tried to do it for all 3 files, it tries to merge 3 files and generates single file with incorrect data.

How to solve this, mainly removing spaces from column names in all files. What would be the other options here?

2

2 Answers

0
votes

Pipeline: ForEach activity (loop over CSV files in folder and send in current iteration item to data flow as param) -> Data Flow activity with source that points to that folder (parameterize the file name in the source path)

0
votes

I created 2 datasets, one in csv with wildcard format, the other in parquet. I used the Data Copy Activity using the parquet data set as sink and csv data set as source. I set the copy behavior to Merge files.