0
votes

Environments

  • Azure Data Factory

Scenario

  • I have ADF pipeline which reads the data from On premise server and writes the data to azure data lake.

  • For the same - I have provided Folder structure in ADF*(dataset)*as follows

    Folder Path : - DBName/RawTables/Transactional

    File Path : - TableName.csv

Problem

Is it possible to parameterized the Folder name or file path ? Basically - if tomorrow - I want to change the folder path*(without deployment)* then we should be updating the metadata or table structure.

3
So do you want the data potentially written to a different destination table based on the source directory?Paul Andrew
Yes exactly..!!Mangesh Tambare

3 Answers

1
votes

So the short answer here is no. You won't be able to achieve this level of dynamic flexibility with ADF on its own.

You'll need to add new defined datasets to your pipeline as inputs for folder changes. In Data Lake you could probably get away with a single stored procedure that accepts a parameter for the file path which could be reused. But this would still require tweaks to the ADF JSON when calling the proc.

Of course, the catch all situation here is to use an ADF custom activity and write a C# class with methods to do whatever you need. Maybe overkill though and lots of effort to setup the authentication to data lake store.

Hope this gives you a steer.

0
votes

Mangesh, why don't you try the .Net custom activity in ADF. This custom activity will be your first activity that will potentially check for the processed folder and if the processed folder is present it will move that to History(say) folder. As, ADF is a platform for data movement and data transformation, it doesn't deal with the IO activity. You can learn more about the .Net custom activity at:

https://docs.microsoft.com/en-us/azure/data-factory/data-factory-use-custom-activities

0
votes

What you want to do is possible with the new Lookup activity in Azure Data Factory V2. Documentation is here: Lookup Activity.

A JSON example would be something like this:

{
    "name": "LookupPipelineDemo",
    "properties": {
        "activities": [
            {
                "name": "LookupActivity",
                "type": "Lookup",
                "typeProperties": {
                    "dataset": { 
                        "referenceName": "LookupDataset", 
                        "type": "DatasetReference" 
                    }
                }
            },
            {
                "name": "CopyActivity",
                "type": "Copy",
                "typeProperties": {
                    "source": { 
                        "type": "SqlSource", 
                        "sqlReaderQuery": "select * from @{activity('LookupActivity').output.tableName}" 
                    },
                    "sink": { 
                        "type": "BlobSink" 
                    }
                },                
                "dependsOn": [ 
                    { 
                        "activity": "LookupActivity", 
                        "dependencyConditions": [ "Succeeded" ] 
                    }
                 ],
                "inputs": [ 
                    { 
                        "referenceName": "SourceDataset", 
                        "type": "DatasetReference" 
                    } 
                ],
                "outputs": [ 
                    { 
                        "referenceName": "SinkDataset", 
                        "type": "DatasetReference" 
                    } 
                ]
            }
        ]
    }
}