1
votes

I'm looking at the Microsoft documentation on moving data to and from an Azure Data Lake Store and finding the following about the fileName property:

Name of the file in the Azure Data Lake store. fileName is optional and case-sensitive. If you specify a filename, the activity (including Copy) works on the specific file."

What I don't see here is any mention of wildcards. How do I go about using, for example, not all of the files in a folder but only those matching 2017-foo-*.json? I tried the asterisk but it appears to be taken literally.

1

1 Answers

2
votes

Depending on your source and sink, you might be able to use the fileFilter property instead of the fileName to use wildcards, eg quote from here:

Allowed values are: * (multiple characters) and ? (single character).

Example 1: "fileFilter": "*.log"

Example 2: "fileFilter": 2014-1-?.txt"

Note that fileFilter is applicable for an input FileShare dataset.

This example is for an on-prem fileshare:

{
    "name": "InputDataset-AllFiles",
    "properties": {
        "published": false,
        "type": "FileShare",
        "linkedServiceName": "OnPremisesFileServerLinkedService",
        "typeProperties": {
            "fileFilter": "*.txt",
            "folderPath": "."
        },
        "availability": {
            "frequency": "Day",
            "interval": 1
        },
        "external": true,
        "policy": {}
    }
}

I'm not sure this property is available for other types like blob storage, data lake etc Can you confirm your source and sink (assuming one of them is data lake)?

Further info available here:

https://docs.microsoft.com/en-us/azure/data-factory/data-factory-onprem-file-system-connector

Polybase now supports ADLS so you could move data in that way without these ADF shenanigans.