Google Cloud Compute Engine VM Instance group always autoheals to max number of instances

Question

I have set up a Google Cloud Compute Engine VM instance group (with number of instances between 2 and 5) and have configured autohealing to start after 3 failed health checks. The instances are created using an instance template with a startup script deploying my application. However, when I attempt to test the autohealing by stopping my application process on one vm, the failing instance is eventually removed and replaced during autohealing, but 3 new instances are also created during the process. I have also configured the instance group's autohealing initial delay also to be 600 seconds, so I don't think that is the issue.

I have checked the instance group's logs for health check statements after enabling logging, and this is what I have discovered:

After the first logged change in health check status, a remove instance operation is performed, followed by an add instance operation.
After the add instance operation, another health check probe result is logged, with health state going from "UNKNOWN"/"UNHEALTHY" to "TIMEOUT"/"UNHEALTHY".
Three more add instance operations are logged around 2 minutes afterwards, which are removed shortly afterwards after when scaling down.

Does anyone know why the 3 extra add instance operations are taking place and is it possible to avoid this?

eagerbeaver eagerbeaver · Accepted Answer · 2020-09-09T17:15:51

Update: The issue was resolved by increasing the cool down period of the autoscaling configuration.

Google Cloud Compute Engine VM Instance group always autoheals to max number of instances

2 Answers