8
votes

We have auto scaling groups for one of our cloud formation stacks that has a CPU based alarm for determining when to scale the instances.

This is great but we recently had it scale up from one node to three and one of those nodes failed to bootstrap via cfn-init. Once the workload reduced and the group scaled back down to one node it killed the two good instances and left the partially bootstrapped node as the only remaining instance. This meant that we stopped processing work until someone logged in and re-ran the bootstrap process.

Obviously this is not ideal. What is the best way to notify the auto scaling group that a node is not healthy when it does not sit behind an ELB?

Since this is just initial bootstrap what I'd really like is to communicate back to the auto scaling group that this node failed and have it terminated and a new node spun up in its place.

2

2 Answers

8
votes

A colleague just showed me http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/as-configure-healthcheck.html which looks handy.

If you have your own health check system, you can use the information from your health check system to set the health state of the instances in the Auto Scaling group.

UPDATE - I managed to get this working during launch.

Here's what my UserData section for the ASG looks like:

#!/bin/bash -v
set -x
export AWS_DEFAULT_REGION=us-west-1
cfn-init --region us-west-1 --stack bapi-prod --resource LaunchConfiguration -v
if [[ $? -ne 0 ]]; then
    export INSTANCE=`curl http://169.254.169.254/latest/meta-data/instance-id`
    aws autoscaling set-instance-health \
         --instance-id $INSTANCE \
         --health-status Unhealthy
fi
0
votes
    cfn-init --region us-west-1 --stack bapi-prod --resource LaunchConfiguration -v
if [[ $? -ne 0 ]]; then
    export INSTANCE=`curl http://169.254.169.254/latest/meta-data/instance-id`
    aws autoscaling set-instance-health \
         --instance-id $INSTANCE \
         --health-status Unhealthy
fi

Can also be done as a one-liner. For example, I'm using the following in Terraform:

runcmd:
 - /tmp/runcmd-puppet.sh || { export INSTANCE=`curl http://169.254.169.254/latest/meta-data/instance-id`; aws autoscaling --region eu-west-1 set-instance-health --instance-id $INSTANCE --health-status Unhealthy; }