1
votes

I'm trying to move files in between multiple gen1 Azure data lake storage instances without having to resort to writing an Azure App Function or directly calling the Azure storage SDK.

The situation is that I've got a few text files in data lake 1 called test1.txt and test2.txt. They're both files with tabs as delimiters. When I try to use the copy activity with .csv as extension, the files do get moved over to data lake 2, but stay .txt files.

The copy activity source and sink looks as follows:

{
    "typeProperties": {
        "source": {
            "type": "DelimitedTextSource",
            "storeSettings": {
                "type": "AzureDataLakeStoreReadSettings",
                "recursive": true,
                "wildcardFileName": "*.*",
                "enablePartitionDiscovery": false
            },
            "formatSettings": {
                "type": "DelimitedTextReadSettings"
            }
        },
        "sink": {
            "type": "DelimitedTextSink",
            "storeSettings": {
                "type": "AzureDataLakeStoreWriteSettings"
            },
            "formatSettings": {
                "type": "DelimitedTextWriteSettings",
                "quoteAllText": true,
                "fileExtension": ".csv"
            }
        }
    }
}

I've tried copying to a blob container as well, but the same issue remains: the files keep their .txt extension.

Is it possible to change the file extension after a copy?

1

1 Answers

-1
votes

You can define the end result filename in the sink dataset, if the files are always going to be called the same then you can hardcore it. If not, it might get a bit more complex because you will want to use a GetMetadata to get every filename, then a Foreach to iterate over them and copy each one with a modified name.

Hope this helped!