2
votes

I am preparing for AWS certification and came across a question about ELB with sticky session enabled for instances in 2 AZs. The problem is that requests from a software-based load tester in one of the AZs end up in the instances in that AZ only instead of being distributed across AZs. At the same time regular requests from customers are evenly distributed across AZs. The correct answers to fix the load tester issue are:

  • Forced the software-based load tester to re-resolve DNS before every request;
  • Use third party load-testing service to send requests from globally distributed clients.

I'm not sure I can understand this scenario. What is the default behaviour of Route 53 when it comes to ELB IPs resolution? In any case, those DNS records have 60 seconds TTL. Isn't it redundant to re-resolve DNS on every request? Besides, DNS resolution is a responsibility of DNS service itself, not load-testing software, isn't it? I can understand that requests from the same instance, with load testing software on it, will go to the same LBed EC2, but why does it have to be an instance in the same AZ? It can be achieved only by Geolocation- or Latency-based routing, but I can't find anything in the specs whether those are the default ones.

3

3 Answers

2
votes

When an ELB is in more than one availability zone, it always has more than one public IP address -- at least one per zone.

When you request these records in a DNS lookup, you get all of these records (assuming there are not very many) or a subset of them (if there are a large number, which would be the case in an active cluster with significant traffic) but they are unordered.

If the load testing software resolves the IP address of the endpoint and holds onto exactly one of the IP addresses -- as it a likely outcome -- then all of the traffic will go to one node of the balancer, which is in one zone, and will send traffic to instances in that zone.

But what about...

Cross-Zone Load Balancing

The nodes for your load balancer distribute requests from clients to registered targets. When cross-zone load balancing is enabled, each load balancer node distributes traffic across the registered targets in all enabled Availability Zones. When cross-zone load balancing is disabled, each load balancer node distributes traffic across the registered targets in its Availability Zone only.

https://docs.aws.amazon.com/elasticloadbalancing/latest/userguide/how-elastic-load-balancing-works.html

If stickiness is configured, those sessions will initially land in one AZ and then stick to that AZ because they stick to the initial instance where they landed. If cross-zone is enabled, the outcome is not quite as clear, but either balancer nodes may prefer instances in their own zone in that scenario (when first establishing stickiness), or this wasn't really the point of the question. Stickiness requires coordination, and cross-AZ traffic takes a non-zero amount of time due to distance (typically <10 ms) but it would make sense for a balancer to prefer to select instances its local zone for sessions with no established affinity.

In fact, configuring the load test software to re-resolve the endpoint for each request is not really the focus of the solution -- the point is to ensure that (1) the load test software uses all of them and does not latch onto exactly one and (2) that if more addresses become available due to the balancer scaling out under load, that the load test software expands its pool of targets.

In any case, those DNS records have 60 seconds TTL. Isn't it redundant to re-resolve DNS on every request?

The software may not see the TTL, may not honor the TTL and, as noted above, may stick to one answer even if multiple are available, because it only needs one in order to make the connection. Every request is not strictly necessary, but it does solve the problem.

Besides, DNS resolution is a responsibility of DNS service itself, not load-testing software, isn't it?

To "resolve DNS" in this context simply means to do a DNS lookup, whatever that means in the specific instance, whether using the OS's DNS resolver or making a direct query to a recursive DNS server. When software establishes a connection to a hostname, it "resolves" (looks up) the associated IP address.

The other solution, "use third party load-testing service to send requests from globally distributed clients," solves the problem by accident, since the distributed clients -- even if they stick to the first address they see -- are more likely to see all of the available addresses. The "global" distribution aspect is a distraction.

ELB relies on random arrival of requests across its external-facing nodes as part of the balancing strategy. Load testing software whose design overlooks this is not properly testing the ELB. Both solutions mitigate the problem in different ways.

2
votes

The sticky is the issue , see here : https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-sticky-sessions.html

The load balancer uses a special cookie to associate the session with the instance that handled the initial request, but follows the lifetime of the application cookie specified in the policy configuration. The load balancer only inserts a new stickiness cookie if the application response includes a new application cookie. The load balancer stickiness cookie does not update with each request. If the application cookie is explicitly removed or expires, the session stops being sticky until a new application cookie is issued.

The first solution, to re-resolve DNS will create new sessions and with this will break the stickiness of the ELB . The second solution is to use multiple clients , stickiness is not an issue if the number of globally distributed clients is large.

PART 2 : could not add as comment , is to long :

Yes, my answer was to simple and incomplete.

What we know is that ELB is 2 AZ and will have 2 nodes with different IP. Not clear how many IP , depends on the number of requests and the number of servers on each AZ. Route 53 is rotating the IP for every new request , first time in NodeA-IP , NodeB-IP , second time is NodeB-IP, NodeA-IP. The load testing application will take with every new request the first IP , balancing between the 2 AZ. Because a Node can route only inside his AZ , if the sticky cookie is for NodeA and the request arrives to NodeB , NodeB will send it to one of his servers in AZ2 ignoring the cookie for a server in AZ 1.

I need to run some tests, quickly tested with Route53 with classic ELB and 2 AZ and is rotating every time the IP's. What I want to test if I have a sticky cookie for AZ 1 and I reach the Node 2 will not forward me to Node 1 ( In case of no available servers, is described in the doc this interesting flow ). Hope to have updates in short time.

0
votes

Just found another piece of evidence that Route 53 returns multiple IPs and rotate them for ELB scaling scenarios:

By default, Elastic Load Balancing will return multiple IP addresses when clients perform a DNS resolution, with the records being randomly ordered on each DNS resolution request. As the traffic profile changes, the controller service will scale the load balancers to handle more requests, scaling equally in all Availability Zones.

And then:

To ensure that clients are taking advantage of the increased capacity, Elastic Load Balancing uses a TTL setting on the DNS record of 60 seconds. It is critical that you factor this changing DNS record into your tests. If you do not ensure that DNS is re-resolved or use multiple test clients to simulate increased load, the test may continue to hit a single IP address when Elastic Load Balancing has actually allocated many more IP addresses.

What I didn't realize at first is that even if regular traffic is distributed evenly across AZs, it doesn't mean that Cross-Zone Load Balancing is enabled. As Michael pointed out regular traffic will naturally come through various locations and end up in different AZs. And as it was not explicitly mentioned in the test, Cross-AZ balancing might not have been in place.

https://aws.amazon.com/articles/best-practices-in-evaluating-elastic-load-balancing/