16
votes

Using AWS Network ELBs: takes at least four minutes for a registered instance to become 'healthy'. The instances and services have been running for days, I am simply de-registering and then registering on the same target group, as part of a deployment. If I use a script or use the AWS UI, CLI it makes no difference.

Healthcheck Settings are:

  • Port: tried various, all have listening services tested via curl. 80,22,9001
  • Healthy threshold: 2
  • Unhealthy threshold: 2
  • Timeout: 10
  • Interval: 30

I can see the connection requests coming in on whatever port has been specified, the service responds appropriately and the connection is then closed. As far as I am aware this should be sufficient for the ELB to determine that the instance is healthy (once the threshold has been passed). Which should mean that my instances are up and running no more than 90 seconds past registration time. I have no idea why this should be happening, should be straight forward.

I cannot determine what would be causing such a long delay given that I have fulfilled the known criteria for my instances being healthy. They sit at the Elb.InitialHealthChecking reason for about 4 minutes. Any ideas on further tests to determine the cause of the delay?

2
How long is the instance deregistered? Is connection draining enabled?John Hanley
I check the status every 15 seconds until it is de-registered, then I perform some other tasks which take about a minute, then I re-register it.Tim Lindsay
draining is set to 15 seconds. I wait until the instances are 'unused' via the describeInstanceHealth api call, before I proceed to do the other tasksTim Lindsay
have this been fixed? I feel I also running into this.karthikeayan

2 Answers

12
votes

We hit this issue with NLBs and raised it with AWS support on March 20, 2018. Their response:

This is a known issue, where a newly registered instance remains in initial state for longer period of time and our internal team is already working on the fix for this issue. Unfortunately, at this point in time we do not have an ETA for the fix.

They confirmed that, under normal circumstances, targets should remain in the initial state until HealthyThreshold health checks pass.

1
votes

That is close to being the expected behavior of NLB.

Yes, when you register a new target to your Network Load Balancer, it is expected to take between 90 and 180 seconds to complete the registration process. After registration is complete, the Network Load Balancer health check systems will begin to send health checks to the target. A newly registered target must pass health checks for the configured interval to enter service and receive traffic. For example, if you configure your health check for a 30 second interval, and require 3 health checks to become healthy, the minimum time a newly registered target could enter service is 90 seconds after a new target passes its first health check.

Similarly, when you deregister a target from your Network Load Balancer, it is expected to take 90-180 seconds to process the requested deregistration, after which it will no longer receive new connections. During this time the Elastic Load Balancing API will report the target in 'draining' state. The target will continue to receive new connections until the deregistration processing has completed. At the end of the configured deregistration delay, the target will not be included in the describe-target-health response for the Target Group, and will return 'unused' with reason 'Target.NotRegistered' when querying for the specific target.

Support also confirmed there is no work in progress to make it any faster than 3 minutes.