2
votes

We already have Move-AzureRmDataLakeStoreItemwhich will move files between folders inside Azure datalake. What I am seeking is to copy files within the datalake without effecting the original file.

The possibilities that I know are-

  1. using USQL to EXTRACT data from sourcefile and then OUTPUT to the destinationfile - but I am trying to copy all sort of files (.gz,.txt,.info,.exe,.msi) and I am not sure if USQL can help me with .gz or .exe or .msi files
  2. using Data Factory to copy data from/to Data Lake store

So, my ask here is do we have anything else at our disposal with which we can perform a copy of files within Azure Data Lake Store?

1
One other approach to consider is, maybe don't copy your files so much? You could land your data in your lake "raw" or staging area, then any additional versions of that file should be refinements, aggregates, cleansed, augmented or processed in some way, not straight duplicates.wBob

1 Answers

5
votes

You have couple of other options,

  1. run distcp on an HDI cluster - Similar to instructions provided here. https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-copy-data-wasb-distcp
  2. use adlcopy if you are copying limited amount of data (saying 10-100's of GB) - https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-copy-data-azure-storage-blob

Does this suffice please? Or do you want something natively supported by Azure Data Lake Store via its REST APIs?

Thanks, Sachin Sheth Program Manager, Azure Data Lake.