Why am I seeing ELB health checks doubling up?

Question

I have an ELB with the following configuration:

Availability Zones: ap-southeast-1a and ap-southeast-1b
Cross-Zone Load Balancing: Enabled
Connection Settings: Idle Timeout: 300 seconds

Health Check Details:

Ping Target HTTPS:443/login
Timeout 15 seconds
Interval 20 seconds
Unhealthy Threshold 5
Healthy Threshold 3

Listener Details:

TCP on port 443, forwarding to instance port 443, where nginx is listening and doing ssl termination.

I continually see double health check calls in the nginx logs. The come at the same moment, at least the same second anyway.

Why?

Michael - sqlbot Michael - sqlbot · Accepted Answer · 2015-07-01T19:35:58

An ELB launched in multiple availability zones or with a lot of traffic, will virtually always have at least one ELB node active for each availability zone where the ELB is provisioned, whether or not there are any instances in that availability zone exist or are healthy.

If you check the source IP addresses of the incoming health check requests, you should see that they differ. Particularly with cross-zone load balancing enabled, you should be seeing one health check each interval from each ELB node, since each ELB node sends healthchecks to each instance.

Example taken from my logs just now:

Jul  1 19:22:25 localhost 172.17.0.251:4076 ELB-HealthChecker/1.0 "GET / HTTP/1.1" 
Jul  1 19:22:25 localhost 172.17.10.98:42667 ELB-HealthChecker/1.0 "GET / HTTP/1.1"

Note that 172.17.*.* is in my VPC, and these two IP addresses are in-range on two of my "public" subnets... but what are they? Those are the internal private IP addresses of the ELB nodes.

Note that ELB "nodes" is a term I may or may not have just now made up, but it describes the virtual machines EC2 has invisibly provisioned to serve as your elastic load balancer. (ELB is apparently deployed on EC2 instances that are controlled by the ELB infrastructure, and these are definitely not visible in your AWS console).

You don't pay separately for these machines, so you needn't typically be concerned with how many there are. They scale up and down automatically with traffic load -- anecdotal observations suggest that the instance class of the nodes may be dynamic, as is the number of nodes. Each node can support a theoretical maximum of 64K connections to your back-end servers, though other capacity constraints are likely to kick in before you hit numbers like that.

You can get a good idea of how many nodes there are in an ELB cluster at any given time using the dig utility against the ELB hostname, as seen in the console.

$ dig xxxxxxxx-yyyyyyyy.us-west-2.elb.amazonaws.com

; <<>> DiG 9.8.1-P1 <<>> xxxxxxxx-yyyyyyyy.us-west-2.elb.amazonaws.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 38905
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 13, ADDITIONAL: 9

;; QUESTION SECTION:
;xxxxxxxx-yyyyyyyy.us-west-2.elb.amazonaws.com. IN A

;; ANSWER SECTION:
xxxxxxxx-yyyyyyyy.us-west-2.elb.amazonaws.com. 59 IN A 54.149.x.x
xxxxxxxx-yyyyyyyy.us-west-2.elb.amazonaws.com. 59 IN A 54.201.x.x

Two A-record answers presumably means two nodes. While address translation and other network hackery could be used by AWS to masquerade multiple machines behind a single address, or one machine behind multiple addresses, observations suggest that the number of answers you receive in response to the DNS query gives you the number of nodes currently deployed for your ELB.

Why am I seeing ELB health checks doubling up?

1 Answers