1
votes

I am using aws Sagemaker endpoint for inference. Based upon amount of traffic, endpoint should scale up and down by adding more instance into the endpoint. I am trying to use instance metrics (CPUUtilization, MemoryUtilization or DiskUtilization) as metric for sagemaker endpoint autoscaling. These are the predefined metrics as defined here: https://docs.aws.amazon.com/sagemaker/latest/dg/inference-pipeline-logs-metrics.html

The problem is that the instance metrics for a given endpoint are sum of all the running instances within an endpoint. For example in the following endpoint runtime settings: Example of aws sagemaker endpoint runtime settings

Current running instances are 5 then the the value of CPUUtilization can range from 0 to 500%. Based upon the number of instances running the maximum value will change hence autoscaling policy should be changed. Question is: Is there any way to find out Metric per instance i.e. CPUUtilizationPerInstance without explicitly calculating them or through custom metric? Autoscaling policy of scaling up and down by setting a threshold on per instance CPUUtilization seems the right way. Is there any other similar option on aws?

2
Can you change the alarm to evaluate based on 'average' instead of 'sum'?Shahad
@Shahad cloudwatch provides statistics per time for such metrics. For example Average CPUUtilization per 1 minute. It do not provide any such statistics per instance.AbdulRehmanLiaqat

2 Answers

1
votes

There is an InvocationsPerInstance metric that shows the average number of invocations per instance when you use the 'Sum' statistic.

https://docs.aws.amazon.com/sagemaker/latest/dg/monitoring-cloudwatch.html

This blog post details how you would go about load testing your endpoint to find a good target value for InvocationsPerInstance to use in autoscaling: https://aws.amazon.com/blogs/machine-learning/load-test-and-optimize-an-amazon-sagemaker-endpoint-using-automatic-scaling/

1
votes

This blog post describes how you would define a custom metric to track average cpu utilisation per instance.

tl;dr

    TargetTrackingScalingPolicyConfiguration={
        'TargetValue': 90.0,
        'CustomizedMetricSpecification':
        {
            'MetricName': 'CPUUtilization',
            'Namespace': '/aws/sagemaker/Endpoints',
            'Dimensions': [
                {'Name': 'EndpointName', 'Value': endpoint_name },
                {'Name': 'VariantName','Value': 'AllTraffic'}
            ],
            'Statistic': 'Average', # Possible - 'Statistic': 'Average'|'Minimum'|'Maximum'|'SampleCount'|'Sum'
            'Unit': 'Percent'
        },
        'ScaleInCooldown': 600,
        'ScaleOutCooldown': 300
    }