0
votes

I have recently setup an azure machine learning experiment to retrain, update and execute on a daily basis using azure data factory following the example documents

and my pipeline is setup similar to below

{
  "name": "RetrainAndExecutePipeline",
  "properties": {
    "activities": [{
      "type": "AzureMLBatchExecution",
      "typeProperties": {
        "webServiceOutputs": {
          "Output-TrainedModel": "TrainedModel"
        },
        "webServiceInputs": {},
        "globalParameters": {}
      },
      "outputs": [{
          "name": "TrainedModel"
        }
      ],
      "policy": {
        "timeout": "01:00:00",
        "concurrency": 1,
        "executionPriorityOrder": "NewestFirst",
        "retry": 3
      },
      "scheduler": {
        "frequency": "Day",
        "interval": 1,
        "offset": "22:00:00",
        "style": "StartOfInterval"
      },
      "name": "Retrain ML Model",
      "linkedServiceName": "TrainingService"
    }],
    "start": "2017-08-20T22:00:00Z",
    "end": "9999-09-09T00:00:00Z",
    "isPaused": false,
    "hubName": "autdatafactoryml_hub",
    "pipelineMode": "Scheduled"
  }
}

and the TrainedModel dataset below

{
  "name": "TrainedModel",
  "properties": {
      "published": false,
      "type": "AzureBlob",
      "linkedServiceName": "AzureStorageLinkedService",
      "typeProperties": {
          "fileName": "trainedModel.ilearner",
          "folderPath": "trainingoutput",
          "format": {
              "type": "TextFormat"
          }
      },
      "availability": {
          "frequency": "Day",
          "interval": 1,
          "offset": "22:00:00",
          "style": "StartOfInterval"
      }
  }
}

I have noticed that after a training is completed, the outputs that i get into the azure blob storage from the web service output connected to the "Train Model" node are the ilearner file and two randomly named files with no extensions even though I haven't specified them. one xml formated file with contents

<?xml version="1.0" encoding="utf-8"?>
<RuntimeInfo>
  <Language>DotNet</Language>
  <Version>4.5.0</Version>
</RuntimeInfo>

and the other with the information that you can see when you visualize the output within the azure ML experiment formatted as json as below

{
  "visualizationType": "learner",
  "learner": {
    "name": "LogisticRegressionClassifier",
    "isTrained": true,
    "settings": {
      "records": [
        ...
      ],
      "features": [
        {
          "name": "Setting",
          "index": 0,
          "elementType": "System.String",
          "featureType": "String Feature"
        },
        {
          "name": "Value",
          "index": 1,
          "elementType": "System.String",
          "featureType": "String Feature"
        }
      ],
      "name": null,
      "numberOfRows": 8,
      "numberOfColumns": 2
    },
    "weights": {
      "records": [
        ...
      ],
      "features": [
        {
          "name": "Feature",
          "index": 0,
          "elementType": "System.String",
          "featureType": "String Feature"
        },
        {
          "name": "Weight",
          "index": 1,
          "elementType": "System.Double",
          "featureType": "Numeric Feature"
        }
      ],
      "name": null,
      "numberOfRows": 92,
      "numberOfColumns": 2
    }
  }
}

This json file is the one that I am interested in as I presume this is the data that shows the co-efficient values and I am wanting to track how individual co-efficient values change as I update the training model, and I have not been able to find a way to capture this output.

My question is, is there a way to capture multiple outputs from a single Web service output in an azure ML experiment using azure data factory? Or is there a completely different way for me to resolve this?

I appreciate everyones' feedback and thank you in advance

1
Hi, Can you please explain further what you're trying to achieve by getting the JSON file? The usual process of re-training ML experiments using ADF is as follows: Training Data -> Feed into ML Model Retraining endpoint -> iLearner -> Use this to update the scoring endpoint. Once the scoring endpoint is updated, you can then use the scoring endpoint to score your new input - DataGeek
@DataGeek thank you for the suggestion, I have updated my question. to sum my update, I am wanting this file as I think this is the one containing co-efficient values. This would change as i change my training data and I want to track it. - Glasody

1 Answers

1
votes

In Azure ML Studio, you can create a web service that has multiple outputs by attaching multiple Web Service Output modules. The outputs from these modules will be returned in JSON format when the web service is called. You can also use multiple Export Data modules to write multiple results to Azure storage for example.