1
votes

I am attempting to deploy a "AWS::SageMaker::Model" that is deployed within a VPC and it stands up OK but when I delete it I get the model being deleted successfully but when it attempts to delete the security group associated with it, it fails saying "DependencyViolation".

Investigation found that the Model object is removed but there is an ENI still remaining that has the security group attached to it.

The stack output is as follows:

stack_puts

The IAM role associated with the model has the following managed policy: "arn:aws:iam::aws:policy/AmazonSageMakerFullAccess".

I know that this happened previously with Lambda when it could run within a VPC and this was fixed, I wonder if we have the same issue with Model.

Also a point to note, this does not appear to happen "AWS::SageMaker::NotebookInstance".

My model definition is as follows:

  TESTMODEL:
    Type: "AWS::SageMaker::Model"
    Properties:
      ExecutionRoleArn: !GetAtt ExecutionRole.Arn
      PrimaryContainer:
        Image: "514117268639.dkr.ecr.ap-southeast-2.amazonaws.com/forecasting-deepar:1"
        ModelDataUrl: "s3://test-sagemaker/sagemaker/DEMO-deepar/output/DEMO-deepar-2018-09-03-02-18-02-278/output/model.tar.gz"
      ModelName: "Test"
      VpcConfig:
        Subnets:
          - subnet-457ee522
          - subnet-c0b82c89
          - subnet-2cc22074
        SecurityGroupIds:
          - !GetAtt SageMakerModelSG.GroupId

  SageMakerModelSG:
    Type: "AWS::EC2::SecurityGroup"
    Properties:
      GroupDescription: "SageMakerModelSG"
      VpcId: vpc-4df92b2a
      Tags:
        - Key: "Name"
          Value: !Join [ -, [ !Ref "AWS::StackName", "SageMakerModelSG" ] ]

  SageMakerModelSGIngresshttps:
    Type: "AWS::EC2::SecurityGroupIngress"
    Properties:
      GroupId: !Ref SageMakerModelSG
      Description: "https"
      IpProtocol: "tcp"
      FromPort: "443"
      ToPort: "443"
      CidrIp: "0.0.0.0/0"
2
Interestingly it seems that a Glue Developer Endpoint attached into a VPC created by CloudFormation also has a similar problem.mransley

2 Answers

1
votes

I raised a support call with AWS and the outcome of the situation is that the network interface takes a while to delete and as such the stack will fail to teardown as I had it designed.

The fix, is either to manually create the security group or create the security group in a different stack. Hence the sagemaker model will tear down and the network interfaces will be removed later on.

I updated my CI tests to reflect this and it works.

0
votes

Have you tried deleting the CloudFormation stack again? There might be delays in disassociating ENI from VPC, so retrying might help here.

If the problem persists, I'd suggest that you create a customer support case or AWS forum post with the following information, so the SageMaker team could investigate into your issue and provide insights.

  • Account Id
  • Region (where you were creating SageMaker resources)
  • Endpoint/EndpointConfig/Model names
  • VPC and subnet Ids