SageMaker - clarification on SageMaker entities in CloudFormation

Question

Question

Would like to clarify the entities in AWS::SageMaker.

SageMaker Model

When looked at the diagram in the Deploy a Model on Amazon SageMaker Hosting Services, the Model artifacts in SageMaker is the data generated by a ML algorithm docker container in the Model training phase, and stored in a S3 bucket.

However, AWS::SageMaker::Model seems to have captured a docker image to run the inference code in a SageMaker endpoint instance. There is no reference to the model data in a S3 bucket. Hence wonder why it is called AWS::SageMaker::Model and why not called such as AWS::SageMaker::InferenceImage.

1-1. What is Model in AWS SageMaker?

1-2. Is it a docker image (algorithm) to do the prediction/inference, not the data to run the algorithm on?

1-3. Does AWS call the runtime (docker runtime + docker image for inference) as Model?

AWS::SageMaker::Model

Type: AWS::SageMaker::Model
Properties: 
  Containers: 
    - ContainerDefinition
  ExecutionRoleArn: String
  ModelName: String
  PrimaryContainer: 
    ContainerDefinition
  Tags: 
    - Tag
  VpcConfig: 
    VpcConfig

SageMaker Endpoint or SageMaker Estimator from a model data in S3

The SageMaker Estimator has an argument output_path as in Python SDK Estimators.

S3 location for saving the training result (model artifacts and output files). If not specified, results are stored to a default bucket. If the bucket with the specific name does not exist, the estimator creates the bucket during the fit() method execution.

For Python ML environment, we can use pickle to export the data and reload back into a model as in 3.4. Model persistence. We can do the same for Spark ML.

2-1. What is the equivalent in SageMaker as AWS::SageMaker::Model has no argument to refer to a data in a S3 bucket?

2-2. Can SageMaker Estimator be re-created using the model data in S3 bucket?

SageMaker Estimator

I thought there would be a resource to define a SageMaker Estimator in CloudFormation, but looks there is none.

3-1. Please help understand if there is a reason.

Thom Lane Thom Lane · Accepted Answer · 2020-06-10T01:02:28

Clearing up a few concept to begin with: an Amazon SageMaker Model is a reference to the model artifacts (e.g. trained model) and the associated runtime (e.g. container and source code). An Estimator is used to train the model and outputs the Model Data (i.e. model.tar.gz) used by a Model. And a Model doesn't reference training code (so an Estimator cannot be constructed from a Model) and it doesn't reference any inference data either: that is passed to an Endpoint or Batch Transform.

Solving the majority of your issues: you can specify ModelDataUrl on a ContainerDefinition for AWS::SageMaker::Model. You would typically reference the Amazon S3 path to the model.tar.gz which was output from the Amazon SageMaker training job.