azure data factory recursive copy from container

Question

Hi I am using Azure Data Factory for a Copy activity. I want the copy to be recursive across a container and it subfolders as follows: myfolder/Year/Month/Day/Hour}/New_Generated_File.csv

The files that I am generating and importing into the folder have always a different name.

The problem is that activity seems to waiting for ever.

The pipeline is scheduled hourly.

I'm attaching the json code of the dataset and the linked service.

Dataset:

{
"name": "Txns_In_Blob",
"properties": {
    "structure": [
        {
            "name": "Column0",
            "type": "String"
        },
        [....Other Columns....]
    ],
    "published": false,
    "type": "AzureBlob",
    "linkedServiceName": "LinkedService_To_Blob",
    "typeProperties": {
        "folderPath": "uploadtransactional/yearno={Year}/monthno={Month}/dayno={Day}/hourno={Hour}/{Custom}.csv",
        "format": {
            "type": "TextFormat",
            "rowDelimiter": "\n",
            "columnDelimiter": "    "
        }
    },
    "availability": {
        "frequency": "Hour",
        "interval": 1
    },
    "external": true,
    "policy": {}
}

}

Linked Service:

{
"name": "LinkedService_To_Blob",
"properties": {
    "description": "",
    "hubName": "dataorchestrationsystem_hub",
    "type": "AzureStorage",
    "typeProperties": {
        "connectionString": "DefaultEndpointsProtocol=https;AccountName=wizestorage;AccountKey=**********"
    }
}

}

Sandesh Sandesh · Accepted Answer · 2016-05-30T04:18:49

It is not mandatory to give the file name in the dataset's folderPath property. Just remove the file name and then all the files will be loaded by the datafactory for you.

{
  "name": "Txns_In_Blob",
  "properties": {
    "structure": [
        {
            "name": "Column0",
            "type": "String"
        },
        [....Other Columns....]
    ],
    "published": false,
    "type": "AzureBlob",
    "linkedServiceName": "LinkedService_To_Blob",
    "typeProperties": {
        "folderPath": "uploadtransactional/yearno={Year}/monthno={Month}/dayno={Day}/hourno={Hour}/",
        "partitionedBy": [
            { "name": "Year", "value": { "type": "DateTime", "date": "SliceStart", "format": "yyyy" } },
            { "name": "Month", "value": { "type": "DateTime", "date": "SliceStart", "format": "%M" } },
            { "name": "Day", "value": { "type": "DateTime", "date": "SliceStart", "format": "%d" } },
            { "name": "Hour", "value": { "type": "DateTime", "date": "SliceStart", "format": "hh" } }
        ],
        "format": {
            "type": "TextFormat",
            "rowDelimiter": "\n",
            "columnDelimiter": "    "
        }
    },
    "availability": {
        "frequency": "Hour",
        "interval": 1
    },
    "external": true,
    "policy": {}
}

With the above folderPath it will generate the run time value uploadtransactional/yearno=2016/monthno=05/dayno=30/hourno=07/ for a pipeline which executes UTC time zone now

azure data factory recursive copy from container

1 Answers