1
votes

I have set up a Google Cloud Compute Engine VM instance group (with number of instances between 2 and 5) and have configured autohealing to start after 3 failed health checks. The instances are created using an instance template with a startup script deploying my application. However, when I attempt to test the autohealing by stopping my application process on one vm, the failing instance is eventually removed and replaced during autohealing, but 3 new instances are also created during the process. I have also configured the instance group's autohealing initial delay also to be 600 seconds, so I don't think that is the issue.

I have checked the instance group's logs for health check statements after enabling logging, and this is what I have discovered:

  1. After the first logged change in health check status, a remove instance operation is performed, followed by an add instance operation.
  2. After the add instance operation, another health check probe result is logged, with health state going from "UNKNOWN"/"UNHEALTHY" to "TIMEOUT"/"UNHEALTHY".
  3. Three more add instance operations are logged around 2 minutes afterwards, which are removed shortly afterwards after when scaling down.

Does anyone know why the 3 extra add instance operations are taking place and is it possible to avoid this?

2

2 Answers

0
votes

Update: The issue was resolved by increasing the cool down period of the autoscaling configuration.

0
votes

As mentioned by OP, the issue was resolved by adjusting the cool down period.