2
votes

In Azure Data Factory, I’m trying to call an Azure Machine Learning model by a Data Factory Pipeline. I want to use a Azure SQL table as input and another Azure SQL table for the output. First I deployed a Machine Learning (classic) web service. Then I created an Azure Data Factory Pipeline, using a LinkedService (type= ‘AzureML’, using Request URI and API key of the ML-webservice) and a input and output dataset (‘AzureSqlTable’ type).

Deploying and Provisioning is succeeded. The pipeline starts as scheduled, but keeps ‘Running’ without any result. The pipeline activity is not being shown in the Monitor&Manage: Activity Windows.

On different sites and tutorials, I only find JSON-scripts using the activity type ‘AzureMLBatchExecution’ with BLOB in- and outputs. I want to use AzureSQL in- and output but I can’t get this working.

Can someone provide a sample JSON-script or tell me what’s possibly wrong with the code below?

Thanks!

{
    "name": "Predictive_ML_Pipeline",
    "properties": {
        "description": "use MyAzureML model",
        "activities": [
            {
                "type": "AzureMLBatchExecution",
                "typeProperties": {},
                "inputs": [
                    {
                        "name": "AzureSQLDataset_ML_Input"
                    }
                ],
                "outputs": [
                    {
                        "name": "AzureSQLDataset_ML_Output"
                    }
                ],
                "policy": {
                    "timeout": "02:00:00",
                    "concurrency": 3,
                    "executionPriorityOrder": "NewestFirst",
                    "retry": 1
                },
                "scheduler": {
                    "frequency": "Week",
                    "interval": 1
                },
                "name": "My_ML_Activity",
                "description": "prediction analysis on ML batch input",
                "linkedServiceName": "AzureMLLinkedService"
            }
        ],
        "start": "2017-04-04T09:00:00Z",
        "end": "2017-04-04T18:00:00Z",
        "isPaused": false,
        "hubName": "myml_hub",
        "pipelineMode": "Scheduled"
    }
}
1
Great question. Have you tested the activity with a sample blob store input/output? I'm afraid I'm lacking it some ML knowledge in terms of how it can accesses the datasets using the ADF linked service details. For normal batch compute activities the service needs its own principal to connect directly. I'll have a play and try this out, maybe next week. - Paul Andrew
Thanks for your reply! I will definitely test it with blob in/output these days, I found a lot of those examples and will post the results in here. - A. Jolink

1 Answers

0
votes

With a little help from a Microsoft technician, I've got this working. The JSON script as mentioned above is only changed in the schedule-section:

 "start": "2017-04-01T08:45:00Z",
 "end": "2017-04-09T18:00:00Z",

A pipeline is active only between its start time and end time. Because the scheduler is set to weekly, the pipeline is triggered at the start of the week: that date should be within start- and end date. For more details about scheduling, see: https://docs.microsoft.com/en-us/azure/data-factory/data-factory-scheduling-and-execution

The Azure SQL Input dataset should look like this:

{
"name": "AzureSQLDataset_ML_Input",
"properties": {
    "published": false,
    "type": "AzureSqlTable",
    "linkedServiceName": "SRC_SQL_Azure",
    "typeProperties": {
        "tableName": "dbo.Azure_ML_Input"
    },
    "availability": {
        "frequency": "Week",
        "interval": 1
    },
    "external": true,
    "policy": {
        "externalData": {
            "retryInterval": "00:01:00",
            "retryTimeout": "00:10:00",
            "maximumRetry": 3
        }
    }
}

I added the external and policy properties to this dataset (see script above) and after that, it worked.