I have set up a Google Cloud Compute Engine VM instance group (with number of instances between 2 and 5) and have configured autohealing to start after 3 failed health checks. The instances are created using an instance template with a startup script deploying my application. However, when I attempt to test the autohealing by stopping my application process on one vm, the failing instance is eventually removed and replaced during autohealing, but 3 new instances are also created during the process. I have also configured the instance group's autohealing initial delay also to be 600 seconds, so I don't think that is the issue.
I have checked the instance group's logs for health check statements after enabling logging, and this is what I have discovered:
- After the first logged change in health check status, a remove instance operation is performed, followed by an add instance operation.
- After the add instance operation, another health check probe result is logged, with health state going from "UNKNOWN"/"UNHEALTHY" to "TIMEOUT"/"UNHEALTHY".
- Three more add instance operations are logged around 2 minutes afterwards, which are removed shortly afterwards after when scaling down.
Does anyone know why the 3 extra add instance operations are taking place and is it possible to avoid this?