Currently i'm configuring a server pool with AWS. It is a simple setup with two database servers an scalable server array and two load balancers in front of it all. Every machine has a failover standing by and it should all be pretty robust.
The load balancers should be able to failover through Round Robin DNS. So in a happy day scenario both machines get hit and distribute the traffic over the array. When one of these machines is down Round Robin DNS in combination with client browser retry should make it so that browsers should shift their target host to the machine which is still up once they hit a timeout. This is not something I came up with but seems like a very good solution.
The problem i'm experiencing is as following. The shift does actually happen but not just once for the failed request but for each and every subsequent request from the same browser. So a simple page request takes 21 seconds to load after which all images also take 21 seconds to load. All the following page request also takes this long. So the failover works but is a the same time completely useless.
Output from a dig:
; <<>> DiG 9.6.1-P2 <<>> example.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45224
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;example.com. IN A
;; ANSWER SECTION:
www.example.com. 86400 IN A 1.2.3.4
www.example.com. 86400 IN A 1.2.3.4
;; Query time: 31 msec
;; SERVER: 172.16.0.23#53(172.16.0.23)
;; WHEN: Mon Dec 20 12:21:25 2010
;; MSG SIZE rcvd: 67
Thanks in advance!
Maarten Hoekstra
Kingsquare Information Services