0
votes

NOTICE: Azure Machine Learning Workbench (Preview) is deprecated. The workflow for deploying models, images and services has been updated since this question was posted.

I have been developing a Machine Learning model for Azure Machine Learning Services using Azure Machine Learning Workbench (Preview). I successfully managed to deploy the model as a web service, as instructed in Azure Machine Learning Documentation (Preview). I have managed to get the service up and running, and the model, manifest and images are all configured correctly. So far so good.

But now I have come to the phase where I want to be able to update the service with new configurations. And this is where I find myself with more questions than answers.

I have figured out that I can

  1. configure a new model
  2. configure a new manifest pointing to that model
  3. configure a new image pointing to that manifest
  4. update an existing (or create a new) service to point to the new image

This seems reasonable enough. But what If I just need to update the manifest, would it be possible to skip the configuration of a new model (1), and just begin the update from (2) above, and let it point to an existing model instead of a new one?

I have of course tried this by calling the following from the CLI, and I get stuck with the following output:

>> az ml manifest create --manifest-name manifestname -f score.py -r python -c aml_config/conda_dependencies.yml -s outputs/schema.json -i [existing-model-id]
Creating new driver at /var/folders/tmp/tmp.py
Successfully created manifest
Id: [manifest-id]
>> az ml image create -n imagename --manifest-id [manifest-id-from-above]
Creating image............................................Done.
Image ID: [image-id]
>> az ml service update realtime -i [existing-service-id] --image-id [image-id-from-above] -v
Updating service..................................Failed
Found default kubeconfig in /Users/username/.kube/config using it
Using kubeconfig file: /Users/username/.kube/config
Kubectl exists in default location, adding it to PATH
loading kubeconfig file
Getting Replica sets from default namespace
Got hash ####
{
    "Azure-cli-ml Version": null,
    "Error": "Error occurred",
    "Response Content": {
        "CreatedTime": "2018-09-17T13:31:22.4230543Z",
        "EndTime": "2018-09-17T13:34:18.0774994Z",
        "Error": {
            "Code": "KubernetesDeploymentFailed",
            "Details": [
                {
                    "Code": "CrashLoopBackOff",
                    "Message": "Back-off 40s restarting failed container=### pod=###"
                }
            ],
            "Message": "Kubernetes Deployment failed",
            "StatusCode": 400
        },
        "Id": "###",
        "OperationType": "Service",
        "ResourceLocation": "###",
        "State": "Failed"
    },
    "Response Headers": {
        "Connection": "keep-alive",
        "Content-Encoding": "gzip",
        "Content-Type": "application/json; charset=utf-8",
        "Date": "Mon, 17 Sep 2018 13:34:22 GMT",
        "Strict-Transport-Security": "max-age=15724800; includeSubDomains; preload",
        "Transfer-Encoding": "chunked",
        "X-Content-Type-Options": "nosniff",
        "X-Frame-Options": "SAMEORIGIN",
        "api-supported-versions": "2017-09-01-preview, 2018-04-01-preview",
        "x-ms-client-request-id": "###",
        "x-ms-client-session-id": ""
    }
}

If I try to rollback to the previous manifest, there is no error message, and everything works just fine. This makes me assume there is something wrong with my new manifest and/or image. There is no warning or error when creating them, however.

I have tried searching for the error messages but I find nothing.

1

1 Answers

0
votes

CrashLoopBackOff error normally means that the init() function of your score.py file has a problem, for example, finding or loading the model. It could also mean you are using a library that hasn't been imported. Azure ML just announced an update to the preview with an updated Python SDK (https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-get-started). There are tutorials and notebooks that show the process in more details with examples. I would start there.

https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-deploy-models-with-aml