24
votes

I'm keen to understand exactly what the ELB Latency Statistic provided by CloudWatch means.

According to the docs:

  • ELB Latency: "Measures the time elapsed in seconds after the request leaves the load balancer until the response is received."

http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/US_MonitoringLoadBalancerWithCW.html

What I'm not 100% clear on is whether or not the response gets buffered to the ELB before it gets transferred to the client?

Does the statement in the docs mean:

  • ELB Latency: "Measures the time elapsed in seconds after the request leaves the load balancer until the response is received [by the client]."

Or:

  • ELB Latency: "Measures the time elapsed in seconds after the request leaves the load balancer until the response is received [by the ELB]."

I want to understand whether or not a poor Maximum Latency CloudWatch metric could be explained by having a significant number of users on ropey 3G connections, or, if it instead indicates an underlying problem with the app servers occasionally responding slowing.

2

2 Answers

23
votes

According to AWS support:

As the ELB (when configured with HTTP listeners) acts as a proxy (request headers comes in and gets validated, and then sent to the backend) the latency metric will start ticking as soon as the headers are sent to the backend until the backend sends the first byte responses.

In case of POSTs (or any HTTP methods when the customer is sending additional data) the latency will be ticking even when the customer is uploading the data (as the backend needs the complete request to send a response) and will stop once the backend send out the first byte response. So if you have a slow client sending data, the latency will take into account the upload time + the time the backend took to respond.

7
votes

It appears to be a measurement of how long the server is taking to generate its response from the ELB's perspective, without regard to how long might be needed for ELB to return the response to the client.

I came to this conclusion by reviewing my own logs in one of my applications, which uses ELB in front of another load balancer, HAProxy, which in turn is in front of the actual application servers. (This may seem redundant, but it gives us several advantages over using only ELB or only HAProxy.)

Here's the setup I'm referring to:

ELB -->>-- EC2+HAProxy -->>-- EC2+Nginx (multipe instances)

HAProxy logs several time metrics on each request, including one called Tr.

Tr: server response time (HTTP mode only). It's the time elapsed between the moment the TCP connection was established to the server and the moment the server sent its complete response headers. It purely shows its request processing time, without the network overhead due to the data transmission.

Now, stick with me for an explanation of why so much discussion of what HAProxy is doing here is relevant to ELB and the Latency metric.

Even though HAProxy logs a number of other timers related to how long the proxy spends waiting for various events on each request/response, this Tr timer is the single timer in my HAProxy logs that neatly corresponds to the values logged by Cloudwatch's "Latency" metric for the ELB on a minute-by-minute basis, give or take a millisecond or two... the others are wildly variant... so I would suggest that this ELB metric is similarly logging the response time of your application server, unrelated to the additional time that might be required to deliver the response back to the client.

It seems very unlikely for the HAProxy and the ELB to be so consistently in agreement, otherwise, given HAProxy's definition of the timer in question, unless ELB's timer is measuring something very similar to what HAProxy is measuring, since these systems are literally measuring the performance of the same exact app servers on the same exact requests.

If your application server doesn't benchmark itself and log timers of its own performance, you may want to consider adding them, since (according to my observations) high values for the Latency metric do seem to suggest that your application may be having a responsiveness issue that is unrelated to client connection quality.