2
votes

I am Able to train my modelusing Sagemaker TensorFlow container.

Below is the code:

model_dir = '/opt/ml/model'
train_instance_type = 'ml.c4.xlarge'
hyperparameters = {'epochs': 10, 'batch_size': 256, 'learning_rate': 0.001}

script_mode_estimator = TensorFlow(
    entry_point='model.py',
    train_instance_type=train_instance_type,
    train_instance_count=1,
    model_dir=model_dir,
    hyperparameters=hyperparameters,
    role=sagemaker.get_execution_role(),
    base_job_name='tf-fashion-mnist',
    framework_version='1.12.0',
    py_version='py3',
    output_path='s3://my_bucket/testing',
    script_mode=True
)

Model Fitting:

script_mode_estimator.fit(inputs)

But when i ama trying to deploy model i ama getting this below error:

Deploy code is:

script_mode_d=script_mode_estimator.deploy(initial_instance_count=1,
                 instance_type="ml.m4.xlarge")

Error is:

UnexpectedStatusException: Error hosting endpoint tf-fashion-mnist-2020-09-23-09-05-25-791: Failed. Reason: The role 'xyz' does not have BatchGetImage permission for the image: '520713654638.dkr.ecr.us-east-1.amazonaws.com/sagemaker-tensorflow-serving:1.12-cpu'.

Please help me to resolve this issue.

2
Have to go to IAM console, locate role xyz and add BatchGetImage permission to it.Marcin
Hi @Marcin Thank you for your reply, after granting the permission it is below error. Please make sure all images included in the model for the production variant AllTraffic exist, and that the execution role used to create the model has permissions to access them.Reshma Ladakhan
This seems like a new issue. You could accept @Theo answer, and make new question with relevant new details for the new problem.Marcin
@Marcin, i have posted this issue.stackoverflow.com/questions/64238027/…Reshma Ladakhan

2 Answers

3
votes

Reason: The role 'xyz' does not have BatchGetImage permission for the image: '520713654638.dkr.ecr.us-east-1.amazonaws.com/sagemaker-tensorflow-serving:1.12-cpu'.

This error means that the IAM role "xyz" (you can find this in the IAM console) does not have permission to make the BatchGetImage API call in ECR (Elastic Container Registry, you can find this service in the ECS console).

You can find a number of example IAM policies you can use for the "xyz" role to grant it permission to perform the API call her: https://docs.aws.amazon.com/AmazonECR/latest/userguide/repository-policy-examples.html

To add a policy go to the IAM console, look for the "xyz" role, and either add an (inline) policy, or edit one of its existing policies (if it already has a policy that grants similar permissions it would make sense to add this permission in that policy).

0
votes

Instead of managing permissions by crafting a permission policy, you can use the AWS-managed AmazonSageMakerFullAccess permission policy, which allows for any actions you might want to perform in SageMaker (including BatchGetImage).

To do so:

  1. Log onto the console -> IAM -> Roles -> Create Role
  2. Create a service-linked role with sagemaker.amazonaws.com
  3. Give the role AmazonSageMakerFullAccess
  4. Give the role AmazonS3FullAccess