3
votes

I want to create linked services to Azure blob conainters, the account information for which comes from an Azure SQL database. As the number of entries and account information are dynamic, so I cant define the linked services to Azure blob containers during the setup part of Azure data factory. Can linked services in Azure data factory be directly created dynamically?

7
I noticed the -1 on the answer below, to clarify are you looking to have a single data factory instance and when run can change it's linked services by querying a sql database for configuration? Or is pre-creating numerous data factory pipelines for each set of account information ok?Alex KeySmith

7 Answers

2
votes

Unfortunately, that is not possible in the current version of ADF. You can only use functions to reference elements relative to the pipeline execution (execution date/time). You cannot dynamically define a blob name using another data source.

You can use functions in data factory along with above mentioned system variables for the following purposes:

  1. Specifying data selection queries (see connector articles referenced by the Data Movement Activities article. The syntax to invoke a data factory function is: $$ for data selection queries and other properties in the activity and datasets.
  2. Specifying input dependencies with data factory functions in activity inputs collection (see sample above). $$ is not needed for specifying input dependency expressions.

from here: https://docs.microsoft.com/en-us/azure/data-factory/data-factory-functions-variables

And this is everything you can currently do with blob inputs: https://docs.microsoft.com/en-us/azure/data-factory/data-factory-azure-blob-connector

in your case, i would suggest that if you can redefine your input blobs to have names/folders that can be derived from the date/time at which the pipeline is executing, than you can get that functionality.

or else you could have a usql call that got the info from sql, and then moved the blob into a date-stamped folder which the pipeline could then pick up.

0
votes

Well I think this would be possible by combination of Powershell script with Data factory cmdlets and Azure functions to run them. So powershell would need to connect and get schmas from storage, create datasets and deploy them, change pipeline, set up pipline schedule and add/remove elements, and then resume (or just set pipeline start for current day) process. But this would need quite a lot of work.

But Data Factory in version 2 which is preview right now gives much more options, including using SSIS, so that could solve that problem.

0
votes

I actually do this using the python sdk and I don't only do this for linked services but for every Activity, all of my metadata is stored in a SQL Server table (I called it adf.Mapping), the table has columns like: source_linked_service_name, source_dataset, target_linked_service_name, target_dataset

The metadata required to create a linked service is stored in another table and I just get this by using the linked_service_name, I have a scheduled task that runs my python script everyday and it updates the linked service if something changed using the library mentioned above:

properties = SqlServerLinkedService(connection_string=connection_string,
                                                user_name=user_name,
                                                password=password,
                                                connect_via=ir_sql_server)

adf_client.adf_object.linked_services.create_or_update( resource_group_name=resource_group_name, factory_name=data_factory_name, linked_service_name=self.name, properties=properties)
logger.info('Created SQL Server linked service: ' + str(self.name))
0
votes

Bu using "Execute SSIS Package" task you can run a ssis package in witch you can manage your dynamic Data Source.

enter image description here

the second way you can do that is to use "Notebook" task. Then you need to run Python or Scala to connect to your source where you have connection's information and then create Data Set in some sort of loop or other things to achieve your goal.

enter image description here

Inside of ADF (even v2) by using ADF objects you can not do that.

0
votes

One way of doing this would be with Azure Databricks Notebooks

Another way would be to parameterize the value

{
  "type":"Microsoft.DataFactory/factories/linkedservices",
  "properties":{
  "parameters": {
        "StorageAccountEndpoint": {
            "type": "String",
            "defaultValue": "https://<<yourstorageaccountname>>.blob.core.windows.net/?sv=2018-03-28&ss=b&srt=sco&sp=rwdlac&se=2019-10-20T16:33:57Z&st=2019-09-20T08:33:57Z&spr=https&sig=lDrBjD%2BjM2T1XjRW997VPMqDp99ZxVoReyRK0VEX7zQ%3D"
        }
    },
    "type": "AzureBlobStorage",
    "typeProperties": {
        "sasUri": "@{linkedService().StorageAccountEndpoint}"
    }

}}
0
votes

You can have a parameterized Linked service that will do this, example:

{
    "name": "AzureBlobStorage1",
    "properties": {
        "type": "AzureBlobStorage",
        "annotations": [],
        "parameters": {
            "storageAccountName": {
                "type": "string"
            }
        },
        "typeProperties": {
            "serviceEndpoint": "@{concat('https://',linkedService().storageAccountName,'.blob.core.windows.net')}",
            "accountKind": "StorageV2"
        }
    }
}

This code allows you to use only the storage account name, but you can use full url if you want.

Then create a dataset with 2 parameters:

  • Storage account name
  • Container name

enter image description here

And then your dataset will look like this:

enter image description here

For this example I used a dataset for binary format, but choose whatever fit's you.

Almost all connectors can be fully parameterized, but not all of them allow it from UI; you'll need to do it in JSON like I did.

0
votes

P.S.--> I didn't deal with dynamic storage account name, but I have done it dynamically for multiple Database names from a Database Account in a ADF pipeline. Hope it helps.

Linked services in Azure data factory can be configured dynamically to accept parameter values at execution time. These services are termed as parameterized linked services in Azure.

You need to configure your dynamic/changing values as parameters in linked service and then refer their values at run time from a config file. Config file can be taken into consideration through lookup activity of AzureDatafactory Pipeline.

A sample parameterized link service to get Cosmos DB name dynamically is shown for reference: Here DBName is parameter which would be passed with actual DB name value when pipeline is triggered.

{
    "name": "ls_name_xyz",
    "type": "Microsoft.DataFactory/factories/linkedservices",
    "properties": {
        "type": "CosmosDbMongoDbApi",
        "parameters": {
            "DBName": {
                "type": "String"
            }
        },
        "annotations": [],
        "typeProperties": {
            "connectionString": "your_connection_string-xyz",
            "database": "@{linkedService().DBName}",
            "encryptedCredential": "your_credentials-xyz"
        }
    }
}