2
votes

I am trying to copy a Blob Container from one Azure Storage account to another. I am using Azure Data Factory Copy Activity to do this. It is simple to copy all the blobs. But i want to copy specific extension blobs only.

I do not see any option to specify wildcard or regex while creating input dataset.

Is there any way i can achieve this with ADF. I also tried Azure Data Movement Library. Even it doesn't have such feature. Only prefix based filtering is available in DML.

2

2 Answers

0
votes

In the dataset definition use the FileFilter attribute to handle this. For example.

{
  "name": "Dataset01",
  "properties": {
    "type": "AzureBlob",
    "linkedServiceName": "BlobStore01",
    "structure": [ ],
    "typeProperties": {
      "folderPath": "FilesFolder1/FilesFolder2",
      "fileFilter": "*.csv" // <<<<< here
      }
      //etc...
     }
     //etc...
    }

This accepts both multi and single char wildcards using * and ?.

More info as part of this docs page:

https://docs.microsoft.com/en-us/azure/data-factory/data-factory-onprem-file-system-connector

Hope this helps.

0
votes

Base on my understanding, there is no file extension concept for blob. Where the file extension will be handy is when our download these files on our local computer. Based on the file extension, our local computer could decide which application to use to open these files.

Is there any way i can achieve this with ADF

We can could use Azure Data Factory Custom activities to do that. We could implement our logic by ourselves. More info about how to use custom activites, please refer to this tutorials.

We also could use Azure WebJob with time trigger to do that.

If Azure Data Factory is the only choice, we could implement copy blob with our logic. The following is my demo code. I tested it on my side, it works correctly

         CloudStorageAccount storageAccountSource = CloudStorageAccount.Parse("connection string");
         CloudStorageAccount storageAccountDest = CloudStorageAccount.Parse("connection string");
         // Create the blob client.
         CloudBlobClient blobClientSource = storageAccountSource.CreateCloudBlobClient();
         CloudBlobClient blobClientDest = storageAccountDest.CreateCloudBlobClient();
         CloudBlobContainer containerSource = blobClientSource.GetContainerReference("test");
         CloudBlobContainer containerDest = blobClientDest.GetContainerReference("test");
         containerDest.CreateIfNotExists();

        SharedAccessBlobPolicy sharedPolicy = new SharedAccessBlobPolicy()
        {

            SharedAccessExpiryTime = DateTime.UtcNow.AddHours(24),
            Permissions = SharedAccessBlobPermissions.Read | SharedAccessBlobPermissions.List |
            SharedAccessBlobPermissions.Write | SharedAccessBlobPermissions.Create | SharedAccessBlobPermissions.Delete
        };

        // Get the container's existing permissions.
        BlobContainerPermissions permissions = containerSource.GetPermissions();
        permissions.SharedAccessPolicies.Add("policy", sharedPolicy);
        containerSource.SetPermissionsAsync(permissions);
        var blobToken = containerSource.GetSharedAccessSignature(sharedPolicy);

        foreach (IListBlobItem item in containerSource.ListBlobs())
        {
            CloudBlob destBlob;
            CloudBlob srcBlob;
            if (item.GetType() == typeof(CloudBlockBlob))
            {
                srcBlob = (CloudBlockBlob)item;
                destBlob = containerDest.GetBlockBlobReference(srcBlob.Name);

            }
            else
            {
                srcBlob = (CloudPageBlob)item;
                destBlob = containerDest.GetPageBlobReference(srcBlob.Name);

            }
            if (srcBlob.Name.Contains("format"))
            {
                destBlob.StartCopy(new Uri(srcBlob.Uri.AbsoluteUri + blobToken));
            }
       }