Sagemaker: Problem with elastic inference when deploying

Question

When executing the deploy code to sagemaker using sagemaker-python-sdk I get error as :

UnexpectedStatusException: Error hosting endpoint tensorflow-inference-eia-XXXX-XX-XX-XX-XX-XX-XXX: 
Failed. Reason: The image '763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference-eia:1.14 
-gpu' does not exist..

The code that I am using to deploy is as:

predictor = model.deploy(initial_instance_count=1,
                         instance_type='ml.p2.xlarge', accelerator_type='ml.eia1.medium')

If I remove the accelerator_type parameter then the endpoint gets deployed with no errors. Any idea on why this happens? Sagemaker seems to be referring to the image that doesn't exist. How do I fix this?

Also, I made sure that the version is supported from here: https://github.com/aws/sagemaker-python-sdk#tensorflow-sagemaker-estimators'. I am on TensorFlow: 1.14.

Edit: Turns out, this works:

predictor = model.deploy(initial_instance_count=1,
                         instance_type='ml.m4.xlarge', accelerator_type='ml.eia1.medium')

So, I am guessing that elastic inference is not available for GPU instances?

Note: None of the instances that I deploy my endpoint to is using GPU. (Please suggest some ideas if you are familiar or have made it work.)

This is not a new issue. I don't know the exact problem with your particular setup because there are several moving parts that I don't see but this might help you github.com/aws/sagemaker-python-sdk/issues/912 — Matus Dubrava
Hi @MatusDubrava, Thanks for the reply. I already checked the link that you provided. The thing is that I can deploy the endpoint successfully if I do not mention the accelerator_type parameter. The link focuses on the instance_type parameter. I am willing to try other things, any suggestions? — Pramesh Bajracharya
If you are running on a GPU instance, why would you want to add an elastic interface? The GPU is already providing the acceleration that you need. — Guy
@Guy, hey, thanks for the reply. After I deploy the model to the endpoint I found that the inference is not taking GPU. So, I wanted to test this with elastic inference. Any idea on this? — Pramesh Bajracharya
@PrameshBajracharya, from your question edit it looks that when you attached the elastic interface to a CPU based machine (M4, for example), you managed to get it to work. Right? — Guy

Olivier Cruchant Olivier Cruchant · Accepted Answer · 2020-12-02T13:52:13

0

votes

Elastic Inference Accelerator (EIA) are designed to be attached to CPU endpoints.

Sagemaker: Problem with elastic inference when deploying

1 Answers