0
votes

I have a requirement to copy few files from an ADLS Gen1 location to another ADLS Gen1 location, but have to create folder based on file name.

I am having few files as below in the source ADLS:

ABCD_20200914_AB01_Part01.csv.gz
ABCD_20200914_AB02_Part01.csv.gz
ABCD_20200914_AB03_Part01.csv.gz
ABCD_20200914_AB03_Part01.json.gz
ABCD_20200914_AB04_Part01.json.gz
ABCD_20200914_AB04_Part01.csv.gz

Scenario-1 I have to copy these files into destination ADLS as below with only csv file and create folder from file name (If folder exists, copy to that folder) :

AB01-
    |-ABCD_20200914_AB01_Part01.csv.gz
AB02-
    |-ABCD_20200914_AB02_Part01.csv.gz
AB03-
    |-ABCD_20200914_AB03_Part01.csv.gz
AB04-
    |-ABCD_20200914_AB04_Part01.csv.gz

Scenario-2 I have to copy these files into destination ADLS as below with only csv and json files and create folder from file name (If folder exists, copy to that folder):

AB01-
    |-ABCD_20200914_AB01_Part01.csv.gz
AB02-
    |-ABCD_20200914_AB02_Part01.csv.gz
AB03-
    |-ABCD_20200914_AB03_Part01.csv.gz
    |-ABCD_20200914_AB03_Part01.json.gz
AB04-
    |-ABCD_20200914_AB04_Part01.csv.gz
    |-ABCD_20200914_AB04_Part01.json.gz

Is there any way to achieve this in Data Factory? Appreciate any leads!

1

1 Answers

2
votes

So I am not sure if this will entirely help, but I had a similar situation where we have 1 zip file and I had to copy those files out into their own folders.

So what you can do is use parameters in the datasink that you would be using, plus a variable activity where you would do a substring.

The job below is more for the delta job, but I think has enough stuff in it to hopefully help. My job can be divided into 3 sections.

enter image description here

The first Orange section gets the latest file name date from ADLS gen 1 folder that you want to copy.

It is then moved to the orange block. On the bottom I get the latest file name based on the ADLS gen 1 date and then I do a sub-string where I take out the date portion of the file. In your case you might be able to do an array and capture all of the folder names that you need.

Getting file name enter image description here

Getting Substring enter image description here

On the top section I get first extract and unzip that file into a test landing zone.

Source enter image description here

Sink enter image description here

I then get the names of all the files that were in that zip file to them be used in the ForEach Activity. These file names will then become folders for the copy activity.

Get File names from initial landing zone: enter image description here

I then pass on those childitems from "Get list of staged files" into ForEach:

enter image description here

In that ForEach activity I have one copy activity. For that I made to datasets. One to grab the files from the initial landing zone that we have created. For this example lets call it Staging (forgive the ms paint drawing):

enter image description here

The purpose of this is to go to that dummy folder and grab each file that was just copied into there. From that 1 zip file we expect 5 files.

In the Sink section what I did is create a new dataset with a parameter for folder and file name. In that dataset I have am putting that data into same container, but created a new folder called "Stage" and concatenated it with the item name. I also added a "replace" command to remove the ".txt" from the file name.

enter image description here

What this will do then is what ever the file name that is coming from that dummy staging it will then have a folder name specifically for each file. Based on your requirements I am not sure if that is what you want to do, but you can always rework that to be more specific.

For Item name I basically get the same file name, then replace the ".txt", concat the name of the date value, and only after that add the ".txt" extension. Otherwise I would have had to ".txt" in the file name.

In the end I have created a delete activity that will then be used to delete all the files (I am not sure if have set that up properly so feel free to adjust obviously).

enter image description here

Hopefully the description above gave you an idea on how to use parameters for your files. Let me know if this helps you in your situation.