1
votes

Hi I am using Azure Data Factory for a Copy activity. I want the copy to be recursive across a container and it subfolders as follows: myfolder/Year/Month/Day/Hour}/New_Generated_File.csv

The files that I am generating and importing into the folder have always a different name.

The problem is that activity seems to waiting for ever.

The pipeline is scheduled hourly.

I'm attaching the json code of the dataset and the linked service.

Dataset:

{
"name": "Txns_In_Blob",
"properties": {
    "structure": [
        {
            "name": "Column0",
            "type": "String"
        },
        [....Other Columns....]
    ],
    "published": false,
    "type": "AzureBlob",
    "linkedServiceName": "LinkedService_To_Blob",
    "typeProperties": {
        "folderPath": "uploadtransactional/yearno={Year}/monthno={Month}/dayno={Day}/hourno={Hour}/{Custom}.csv",
        "format": {
            "type": "TextFormat",
            "rowDelimiter": "\n",
            "columnDelimiter": "    "
        }
    },
    "availability": {
        "frequency": "Hour",
        "interval": 1
    },
    "external": true,
    "policy": {}
}

}

Linked Service:

{
"name": "LinkedService_To_Blob",
"properties": {
    "description": "",
    "hubName": "dataorchestrationsystem_hub",
    "type": "AzureStorage",
    "typeProperties": {
        "connectionString": "DefaultEndpointsProtocol=https;AccountName=wizestorage;AccountKey=**********"
    }
}

}

1

1 Answers

2
votes

It is not mandatory to give the file name in the dataset's folderPath property. Just remove the file name and then all the files will be loaded by the datafactory for you.

{
  "name": "Txns_In_Blob",
  "properties": {
    "structure": [
        {
            "name": "Column0",
            "type": "String"
        },
        [....Other Columns....]
    ],
    "published": false,
    "type": "AzureBlob",
    "linkedServiceName": "LinkedService_To_Blob",
    "typeProperties": {
        "folderPath": "uploadtransactional/yearno={Year}/monthno={Month}/dayno={Day}/hourno={Hour}/",
        "partitionedBy": [
            { "name": "Year", "value": { "type": "DateTime", "date": "SliceStart", "format": "yyyy" } },
            { "name": "Month", "value": { "type": "DateTime", "date": "SliceStart", "format": "%M" } },
            { "name": "Day", "value": { "type": "DateTime", "date": "SliceStart", "format": "%d" } },
            { "name": "Hour", "value": { "type": "DateTime", "date": "SliceStart", "format": "hh" } }
        ],
        "format": {
            "type": "TextFormat",
            "rowDelimiter": "\n",
            "columnDelimiter": "    "
        }
    },
    "availability": {
        "frequency": "Hour",
        "interval": 1
    },
    "external": true,
    "policy": {}
}

With the above folderPath it will generate the run time value uploadtransactional/yearno=2016/monthno=05/dayno=30/hourno=07/ for a pipeline which executes UTC time zone now