DNS Round Robin very slow requests in failover scenario

Question

Currently i'm configuring a server pool with AWS. It is a simple setup with two database servers an scalable server array and two load balancers in front of it all. Every machine has a failover standing by and it should all be pretty robust.

The load balancers should be able to failover through Round Robin DNS. So in a happy day scenario both machines get hit and distribute the traffic over the array. When one of these machines is down Round Robin DNS in combination with client browser retry should make it so that browsers should shift their target host to the machine which is still up once they hit a timeout. This is not something I came up with but seems like a very good solution.

The problem i'm experiencing is as following. The shift does actually happen but not just once for the failed request but for each and every subsequent request from the same browser. So a simple page request takes 21 seconds to load after which all images also take 21 seconds to load. All the following page request also takes this long. So the failover works but is a the same time completely useless.

Output from a dig:
; <<>> DiG 9.6.1-P2 <<>> example.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45224
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;example.com. IN A

;; ANSWER SECTION:
www.example.com. 86400 IN A 1.2.3.4
www.example.com. 86400 IN A 1.2.3.4

;; Query time: 31 msec
;; SERVER: 172.16.0.23#53(172.16.0.23)
;; WHEN: Mon Dec 20 12:21:25 2010
;; MSG SIZE rcvd: 67

Thanks in advance!

Maarten Hoekstra
Kingsquare Information Services

Martin v. Löwis Martin v. Löwis · Accepted Answer · 2010-12-20T13:17:54

When the DNS server gives a list of IP addresses to the client, this list will be ordered (possibly in a rotating manner, i.e. subsequent DNS might return them in a different order). It is likely that the browser caches the DNS response, i.e. the list it originally received. It then does not assume that a failed connection means that the server is down, but will retry the list in the same order every time.

So round-robin DNS is for load balancing at best; it is not very well suited to support fault tolerance.

DNS Round Robin very slow requests in failover scenario

2 Answers