aws Sagemaker autoscaling with instance metrics per instance

Question

I am using aws Sagemaker endpoint for inference. Based upon amount of traffic, endpoint should scale up and down by adding more instance into the endpoint. I am trying to use instance metrics (CPUUtilization, MemoryUtilization or DiskUtilization) as metric for sagemaker endpoint autoscaling. These are the predefined metrics as defined here: https://docs.aws.amazon.com/sagemaker/latest/dg/inference-pipeline-logs-metrics.html

The problem is that the instance metrics for a given endpoint are sum of all the running instances within an endpoint. For example in the following endpoint runtime settings:

Current running instances are 5 then the the value of CPUUtilization can range from 0 to 500%. Based upon the number of instances running the maximum value will change hence autoscaling policy should be changed. Question is: Is there any way to find out Metric per instance i.e. CPUUtilizationPerInstance without explicitly calculating them or through custom metric? Autoscaling policy of scaling up and down by setting a threshold on per instance CPUUtilization seems the right way. Is there any other similar option on aws?

Can you change the alarm to evaluate based on 'average' instead of 'sum'? — Shahad
@Shahad cloudwatch provides statistics per time for such metrics. For example Average CPUUtilization per 1 minute. It do not provide any such statistics per instance. — AbdulRehmanLiaqat

fm1ch4 fm1ch4 · Accepted Answer · 2019-12-24T00:26:11

There is an InvocationsPerInstance metric that shows the average number of invocations per instance when you use the 'Sum' statistic.

https://docs.aws.amazon.com/sagemaker/latest/dg/monitoring-cloudwatch.html

This blog post details how you would go about load testing your endpoint to find a good target value for InvocationsPerInstance to use in autoscaling: https://aws.amazon.com/blogs/machine-learning/load-test-and-optimize-an-amazon-sagemaker-endpoint-using-automatic-scaling/

aws Sagemaker autoscaling with instance metrics per instance

2 Answers