I set a CloudWatch Alarm to add 1 capacity unit to EC2 autoscaling group when memory reservation is > 70%. The Alarm was triggered at the right moment, but it has since been in alarm for 16 hours+ with no change at all in the EC2 autoscaling group. What could possibly be going wrong?
Here's my ECS CloudFormation template:
ECSCluster:
Type: AWS::ECS::Cluster
Properties:
ClusterName: !Ref EnvironmentName
ECSAutoScalingGroup:
DependsOn: ECSCluster
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
VPCZoneIdentifier: !Ref Subnets
LaunchConfigurationName: !Ref ECSLaunchConfiguration
MinSize: !Ref ClusterMinSize
MaxSize: !Ref ClusterMaxSize
DesiredCapacity: !Ref ClusterDesiredCapacity
CreationPolicy:
ResourceSignal:
Timeout: PT15M
UpdatePolicy:
AutoScalingRollingUpdate:
MinInstancesInService: 1
MaxBatchSize: 1
PauseTime: PT15M
SuspendProcesses:
- HealthCheck
- ReplaceUnhealthy
- AZRebalance
- AlarmNotification
- ScheduledActions
WaitOnResourceSignals: true
ScaleUpPolicy:
Type: AWS::AutoScaling::ScalingPolicy
Properties:
AdjustmentType: ChangeInCapacity
AutoScalingGroupName: !Ref ECSAutoScalingGroup
Cooldown: '1'
ScalingAdjustment: '1'
MemoryReservationAlarmHigh:
Type: AWS::CloudWatch::Alarm
Properties:
EvaluationPeriods: '2'
Statistic: Average
Threshold: '70'
AlarmDescription: Alarm if Cluster Memory Reservation is too high
Period: '60'
AlarmActions:
- Ref: ScaleUpPolicy
Namespace: AWS/ECS
Dimensions:
- Name: ClusterName
Value: !Ref ECSCluster
ComparisonOperator: GreaterThanThreshold
MetricName: MemoryReservation
ScaleDownPolicy:
Type: AWS::AutoScaling::ScalingPolicy
Properties:
AdjustmentType: ChangeInCapacity
AutoScalingGroupName: !Ref ECSAutoScalingGroup
Cooldown: '1'
ScalingAdjustment: '-1'
MemoryReservationAlarmLow:
Type: AWS::CloudWatch::Alarm
Properties:
EvaluationPeriods: '2'
Statistic: Average
Threshold: '30'
AlarmDescription: Alarm if Cluster Memory Reservation is too Low
Period: '60'
AlarmActions:
- Ref: ScaleDownPolicy
Namespace: AWS/ECS
Dimensions:
- Name: ClusterName
Value: !Ref ECSCluster
ComparisonOperator: LessThanThreshold
MetricName: MemoryReservation
ECSLaunchConfiguration:
Type: AWS::AutoScaling::LaunchConfiguration
Properties:
KeyName: !If [IsProd, !Ref 'AWS::NoValue', !Ref KeyName]
ImageId: !Ref ECSAMI
InstanceType: !Ref InstanceType
SecurityGroups:
- !Ref SecurityGroup
IamInstanceProfile: !Ref ECSInstanceProfile
UserData:
"Fn::Base64": !Sub |
#!/bin/bash
source /etc/profile.d/proxy.sh
yum install -y https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/linux_amd64/amazon-ssm-agent.rpm
yum install -y https://s3.amazonaws.com/amazoncloudwatch-agent/amazon_linux/amd64/latest/amazon-cloudwatch-agent.rpm
yum install -y aws-cfn-bootstrap hibagent
cat >> /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml <<EOF
[proxy]
http_proxy="${!http_proxy}"
https_proxy="${!https_proxy}"
no_proxy="${!no_proxy}"
EOF
/opt/aws/bin/cfn-init -v --region ${AWS::Region} --stack ${AWS::StackName} --resource ECSLaunchConfiguration
/opt/aws/bin/cfn-signal -e $? --region ${AWS::Region} --stack ${AWS::StackName} --resource ECSAutoScalingGroup
/usr/bin/enable-ec2-spot-hibernation
Metadata:
AWS::CloudFormation::Init:
config:
packages:
yum:
collectd: []
commands:
01_add_instance_to_cluster:
command: !Sub echo ECS_CLUSTER=${ECSCluster} >> /etc/ecs/ecs.config
02_enable_cloudwatch_agent:
command: !Sub /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c ssm:${ECSCloudWatchParameter} -s
files:
/etc/cfn/cfn-hup.conf:
mode: 000400
owner: root
group: root
content: !Sub |
[main]
stack=${AWS::StackId}
region=${AWS::Region}
/etc/cfn/hooks.d/cfn-auto-reloader.conf:
content: !Sub |
[cfn-auto-reloader-hook]
triggers=post.update
path=Resources.ECSLaunchConfiguration.Metadata.AWS::CloudFormation::Init
action=/opt/aws/bin/cfn-init -v --region ${AWS::Region} --stack ${AWS::StackName} --resource ECSLaunchConfiguration
services:
sysvinit:
cfn-hup:
enabled: true
ensureRunning: true
files:
- /etc/cfn/cfn-hup.conf
- /etc/cfn/hooks.d/cfn-auto-reloader.conf
# This IAM Role is attached to all of the ECS hosts. It is based on the default role
# published here:
# http://docs.aws.amazon.com/AmazonECS/latest/developerguide/instance_IAM_role.html
#
# You can add other IAM policy statements here to allow access from your ECS hosts
# to other AWS services. Please note that this role will be used by ALL containers
# running on the ECS host.
ECSRole:
Type: AWS::IAM::Role
Properties:
Path: /
RoleName: !Sub ${EnvironmentName}-ECSRole-${AWS::Region}
AssumeRolePolicyDocument: |
{
"Statement": [{
"Action": "sts:AssumeRole",
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
}
}]
}
ManagedPolicyArns:
- !Sub "arn:aws:iam::${AWS::AccountId}:policy/CSOPSRestrictionPolicy"
- !Sub "arn:aws:iam::${AWS::AccountId}:policy/HIPIAMRestrictionPolicy"
- !Sub "arn:aws:iam::${AWS::AccountId}:policy/HIPBasePolicy"
- arn:aws:iam::aws:policy/service-role/AmazonEC2RoleforSSM
- arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy
Policies:
- PolicyName: ecs-service
PolicyDocument: |
{
"Statement": [{
"Effect": "Allow",
"Action": [
"ecs:CreateCluster",
"ecs:DeregisterContainerInstance",
"ecs:DiscoverPollEndpoint",
"ecs:Poll",
"ecs:RegisterContainerInstance",
"ecs:StartTelemetrySession",
"ecs:Submit*",
"ecr:BatchCheckLayerAvailability",
"ecr:BatchGetImage",
"ecr:GetDownloadUrlForLayer",
"ecr:GetAuthorizationToken"
],
"Resource": "*"
}]
}
ECSInstanceProfile:
Type: AWS::IAM::InstanceProfile
Properties:
Path: /
Roles:
- !Ref ECSRole
ECSServiceAutoScalingRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
Action:
- "sts:AssumeRole"
Effect: Allow
Principal:
Service:
- application-autoscaling.amazonaws.com
Path: /
ManagedPolicyArns:
- !Sub "arn:aws:iam::${AWS::AccountId}:policy/CSOPSRestrictionPolicy"
- !Sub "arn:aws:iam::${AWS::AccountId}:policy/HIPIAMRestrictionPolicy"
- !Sub "arn:aws:iam::${AWS::AccountId}:policy/HIPBasePolicy"
Policies:
- PolicyName: ecs-service-autoscaling
PolicyDocument:
Statement:
Effect: Allow
Action:
- application-autoscaling:*
- cloudwatch:DescribeAlarms
- cloudwatch:PutMetricAlarm
- ecs:DescribeServices
- ecs:UpdateService
Resource: "*"
ECSCloudWatchParameter:
Type: AWS::SSM::Parameter
Properties:
Description: CloudWatch Log configs for ECS cluster
Name: !Sub AmazonCloudWatch-${ECSCluster}-ECS
Type: String
Value: !Sub |
{
"logs": {
"force_flush_interval": 5,
"logs_collected": {
"files": {
"collect_list": [
{
"file_path": "/var/log/messages",
"log_group_name": "${ECSCluster}/var/log/messages",
"log_stream_name": "{instance_id}",
"timestamp_format": "%b %d %H:%M:%S"
},
{
"file_path": "/var/log/dmesg",
"log_group_name": "${ECSCluster}/var/log/dmesg",
"log_stream_name": "{instance_id}"
},
{
"file_path": "/var/log/docker",
"log_group_name": "${ECSCluster}/var/log/docker",
"log_stream_name": "{instance_id}",
"timestamp_format": "%Y-%m-%dT%H:%M:%S.%f"
},
{
"file_path": "/var/log/ecs/ecs-init.log",
"log_group_name": "${ECSCluster}/var/log/ecs/ecs-init.log",
"log_stream_name": "{instance_id}",
"timestamp_format": "%Y-%m-%dT%H:%M:%SZ"
},
{
"file_path": "/var/log/ecs/ecs-agent.log.*",
"log_group_name": "${ECSCluster}/var/log/ecs/ecs-agent.log",
"log_stream_name": "{instance_id}",
"timestamp_format": "%Y-%m-%dT%H:%M:%SZ"
},
{
"file_path": "/var/log/ecs/audit.log",
"log_group_name": "${ECSCluster}/var/log/ecs/audit.log",
"log_stream_name": "{instance_id}",
"timestamp_format": "%Y-%m-%dT%H:%M:%SZ"
}
]
}
}
},
"metrics": {
"append_dimensions": {
"AutoScalingGroupName": "${!aws:AutoScalingGroupName}",
"InstanceId": "${!aws:InstanceId}",
"InstanceType": "${!aws:InstanceType}"
},
"metrics_collected": {
"collectd": {
"metrics_aggregation_interval": 60
},
"disk": {
"measurement": [
"used_percent"
],
"metrics_collection_interval": 60,
"resources": [
"/"
]
},
"mem": {
"measurement": [
"mem_used_percent"
],
"metrics_collection_interval": 60
},
"statsd": {
"metrics_aggregation_interval": 60,
"metrics_collection_interval": 10,
"service_address": ":8125"
}
}
}
}
ECSClusterParameter:
Type: AWS::SSM::Parameter
Properties:
Description: !Sub ${EnvironmentName} - ECS Cluster
Name: !Sub /${EnvironmentName}/ecs-cluster
Type: String
Value: !Ref ECSCluster
ECSServiceAutoScalingRoleParameter:
Type: AWS::SSM::Parameter
Properties:
Description: !Sub ${EnvironmentName} - ECS Service ASG Role
Name: !Sub /${EnvironmentName}/ecs-service-asg-role
Type: String
Value: !GetAtt ECSServiceAutoScalingRole.Arn
The Alarm activity history:
2019-12-26 11:40:54 Action Successfully executed action arn:aws:autoscaling:ap-southeast-2:031539715286:scalingPolicy:95e836b6-2f56-498d-b931-7ec4184bedc4:autoScalingGroupName/ECS-UEBZA8GAP8S7-ECSAutoScalingGroup-1BIBTJH5I50W9:policyName/ECS-UEBZA8GAP8S7-ScaleUpPolicy-17LUWE42DC7EO
2019-12-26 11:40:54 State update Alarm updated from OK to In alarm