I am using aws Sagemaker endpoint for inference. Based upon amount of traffic, endpoint should scale up and down by adding more instance into the endpoint. I am trying to use instance metrics (CPUUtilization, MemoryUtilization or DiskUtilization) as metric for sagemaker endpoint autoscaling. These are the predefined metrics as defined here: https://docs.aws.amazon.com/sagemaker/latest/dg/inference-pipeline-logs-metrics.html
The problem is that the instance metrics for a given endpoint are sum of all the running instances within an endpoint. For example in the following endpoint runtime settings:
Current running instances are 5 then the the value of CPUUtilization can range from 0 to 500%. Based upon the number of instances running the maximum value will change hence autoscaling policy should be changed. Question is: Is there any way to find out Metric per instance i.e. CPUUtilizationPerInstance without explicitly calculating them or through custom metric? Autoscaling policy of scaling up and down by setting a threshold on per instance CPUUtilization seems the right way. Is there any other similar option on aws?