I have a CloudFormation stack set up which creates an autoscaling group (ASG) along with some other items that aren't relevant.
There is an update policy on the ASG as follows:
UpdatePolicy:
AutoScalingReplacingUpdate:
WillReplace: 'false'
AutoScalingScheduledAction:
IgnoreUnmodifiedGroupSizeProperties: 'true'
AutoScalingRollingUpdate:
MinInstancesInService: '0'
MinSuccessfulInstancesPercent: '50'
MaxBatchSize: '2'
PauseTime: PT10M
WaitOnResourceSignals: 'true'
As part of our release process we updating the launch configuration in CloudFormation. This triggers the ASG to update, which is desired.
There is a life-cycle hook with a 600 second timeout value set up to prevent the EC2 instance from going InService until a few checks are done. If these checks fail I send an error signal back to ASG and send an ABANDON to the lifecycle-hook.
/opt/aws/bin/cfn-signal -e 1 --stack ${AWS::StackId} --resource MyASG --region ${AWS::Region}
INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
ASG_NAME=$(aws ec2 --region ${AWS::Region} describe-tags --filters Name=resource-type,Values=instance Name=resource-id,Values=$(/opt/aws/bin/ec2-metadata -i | cut -d\: -f2 | tr -d '[:space:]') Name=key,Values='aws:autoscaling:groupName' | jq '.Tags[] | .Value' -r)
HOOK_NAME=$(aws autoscaling describe-lifecycle-hooks --auto-scaling-group-name $ASG_NAME --region ${AWS::Region} |jq -r '.LifecycleHooks[0].LifecycleHookName')
aws autoscaling complete-lifecycle-action --lifecycle-hook-name $HOOK_NAME --auto-scaling-group-name $ASG_NAME --lifecycle-action-result $1 --instance-id $INSTANCE_ID --region ${AWS::Region}
This works in that the EC2 instance is canceled and terminated. The problem I'm having is that ASG in the CloudFormation stack continues to sit in UPDATE_IN_PROGRESS for an hour before it fails with a "Group did not stabilize" error and everything starts to roll back.
Since the PauseTime is set to "PT10M", I would expect it to wait at most 10 minutes and start rolling back as soon as the cfn-signal error signal is sent.
I'm unable to determine why this the stack is waiting an hour. Any ideas here?