1
votes

I need to migrate all my data from Azur data lake Gen1 to Lake Gen2. In my lake we have different types of file mixed (.txt, .zip,.json and many other). We want to move them as-it-is to GEN2 lake. Along with that we also want to maintain last updated time for all files as GEN1 lake.

I was looking to use ADF for this use case. But for that we need to define dataset, and to define dataset we have to define data format(Avro,json,xml, binary etc). As we have different type of data mixed, I tried to use binary format. But with binary format all file at destination have content type "application/octate-stream". Also not able to retain file update time.

2
Accept the answer that is helpful as it will be beneficial for community members.HarithaMaddi-MSFT

2 Answers

0
votes

As you said, when the files are copied to Data Lake Gen2, all the files properties will be changed, such as 'LAST MODIFIED' time.

Like file uploading, these files are new created in Gen 2, and Azure will create the new properties for them. That's why We can not keep the old property in Gen 1.

When using binary format as the dataset, all the content type is application/octate-stream, we also can not change it.

The property difference between Gen1 and Gen 2(I copied files from Gen 1 to Gen 2): enter image description here

Unless we download the 'word.csv' file and re-upload, the content type will change to application/vnd.ms-excel:

enter image description here

HTH.

0
votes

Last Modified Time is system metadata that represents that modification in the filesystem/container and it cannot be updated. Adding user meta data to capture meta data from the source is work around and powershell/.net/java sdk can be used for updating additional property. Below the workaround is implemented in PowerShell

enter image description here