I am trying to automate the deployment of a SageMaker multi-model endpoints with AWS CDK using Python language (I guess it would be the same by directly writing a CloudFormation template in json/yaml format), but when trying to deploy it, error occurs at the creation of the SageMaker model.
Here is part of the CloudFormation template made with the cdk synth
command:
Resources:
smmodelexecutionrole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Statement:
- Action: sts:AssumeRole
Effect: Allow
Principal:
Service: sagemaker.amazonaws.com
Version: "2012-10-17"
Policies:
- PolicyDocument:
Statement:
- Action: s3:GetObject
Effect: Allow
Resource:
Fn::Join:
- ""
- - "arn:"
- Ref: AWS::Partition
- :s3:::<bucket_name>/deploy_multi_model_artifact/*
Version: "2012-10-17"
PolicyName: policy_s3
- PolicyDocument:
Statement:
- Action: ecr:*
Effect: Allow
Resource:
Fn::Join:
- ""
- - "arn:"
- Ref: AWS::Partition
- ":ecr:"
- Ref: AWS::Region
- ":"
- Ref: AWS::AccountId
- :repository/<my_ecr_repository>
Version: "2012-10-17"
PolicyName: policy_ecr
Metadata:
aws:cdk:path: <omitted>
smmodel:
Type: AWS::SageMaker::Model
Properties:
ExecutionRoleArn:
Fn::GetAtt:
- smmodelexecutionrole
- Arn
Containers:
- Image: xxxxxxxxxxxx.dkr.ecr.<my_aws_region>.amazonaws.com/<my_ecr_repository>/multi-model:latest
Mode: MultiModel
ModelDataUrl: s3://<bucket_name>/deploy_multi_model_artifact/
ModelName: MyModel
Metadata:
aws:cdk:path: <omitted>
When running cdk deploy
on the Terminal, the following error occur:
3/6 | 7:56:58 PM | CREATE_FAILED | AWS::SageMaker::Model | sm_model (smmodel)
Could not access model data at s3://<bucket_name>/deploy_multi_model_artifact/.
Please ensure that the role "arn:aws:iam::xxxxxxxxxxxx:role/<my_role>" exists
and that its trust relationship policy allows the action "sts:AssumeRole" for the service principal "sagemaker.amazonaws.com".
Also ensure that the role has "s3:GetObject" permissions and that the object is located in <my_aws_region>.
(Service: AmazonSageMaker; Status Code: 400; Error Code: ValidationException; Request ID: xxxxx)
What I have:
- An ECR repository containing the docker image
- A S3 bucket containing the model artifacts (.tar.gz files) inside the "folder" "deploy_multi_model_artifact"
To test if it is a IAM role issue, I tried to replace MultiModel
by SingleModel
and replace s3://<bucket_name>/deploy_multi_model_artifact/
with s3://<bucket_name>/deploy_multi_model_artifact/one_of_my_artifacts.tar.gz
, and I could create successfully the model. I am then guessing that it is not a problem related with the IAM contrary to what the error message tells me (but I may make a mistake!) as it seems .
So I am wondering where the problem comes from. This is even more confusing as I have already deployed this multi-model endpoints using boto3 without problem.
Any help would be greatly appreciated !!
(About Multi-Model Endpoints deployment: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/multi_model_xgboost_home_value/xgboost_multi_model_endpoint_home_value.ipynb)