0
votes

Is "Derived Column" step in Data Flow of Azure Data Factory able to dynamically detect empty value and replace with NULL value while looping through several files with different numbers of columns?

What I am trying to do is to copy 10 csv files to 10 tables straight without any manipulation but only replace empty value with NULL. Each of these 10 files contain 5-25 columns. I know I can build 10 data flow to achieve this but was wondering if there is a smarter way to LOOKUP the files list and FOREACH file dynamically replace empty values?

Update:

Thanks to Joseph's answer. It seems data flow automatically treats empty value as NULL. But I then encountered another parsing problem that date / time / datetime columns in Sink receive NULL value.

Source: The positions of date / time / datetime columns are not fixed in each file, neither with the same column name. Since the source is in CSV format, all columns are treated as string type and cannot use "type" for pattern matching: https://i.stack.imgur.com/X1C7z.png

Sink: From the example above, all the highlighted columns are transferred as NULL in Sink: https://i.stack.imgur.com/PecRb.png

1
Do your csv files have headers? In my answer, it must have.Joseph Xu
Yes, they have headersBen S

1 Answers

1
votes

Yes, we can. In my test, I copied 2 csv files to 2 tables by one data flow.

  1. I created an Array variable named Objects. It's format is like this [{"source":"nullValue1.csv","sink":"TestNull"},{"source":"nullValue2.csv","sink":"TestNull2"}]. In your case, you can add 10 Objects to the array.

enter image description here

  1. Then we can foreach the Objects array:

enter image description here

  1. In the ForEach activity, we can define a dataflow.

enter image description here

  1. We can enter dynamic content @item().source to specify the filename dynamically.

enter image description here

  1. Here comes the point, in the DerivedColumn, we can add a column pattern and use iifNull($$,toString(null())) to detect empty value in each column and replace with NULL value.

enter image description here

  1. In the sink, we can enter dynamic content @item().sink to specify the tablename in SQL.

enter image description here



Update

My origin date format is like this: 11/04/2020 1:17:40.

According to this documnet, we can use name option to specify the column which need to be date format convert.

enter image description here

Here we can use the expression toString(toTimestamp(toString($$),'MM/dd/yyyy HH:mm:ss'),'yyyy-MM-dd HH:mm:SS') to convert the date format.

enter image description here

The result shows, we can see the Column4: enter image description here

Part2
We should replace the / with -.

First pattern 1==1&&name!='ClockingDate'&&name!='CreationDate'&&name!='ClockingTime'
Others are like follows: enter image description here

The data preview like this:

enter image description here