I recently started reading about and playing around with AWS. I have particular interest in the different high availability architectures that can be acheived using the platform. Specifically, I am looking for a reliable poor man's solution that can be implemented using the least amount of servers.
So far, I am satisfied with solutions for the main HA concerns: load balancing, redundancy, auto recovery, scalability ...
The only sticking point I have is with failover solutions.
Using an ELB might seem great, however ELB actually uses DNS balancing under the hood. See Is AWS's Elastic Load Balancer a single point of failure?. Also from a Netflix blog post: Lessons Netflix Learned from the AWS Outage
This is because the ELB is a two tier load balancing scheme. The first tier consists of basic DNS based round robin load balancing. This gets a client to an ELB endpoint in the cloud that is in one of the zones that your ELB is configured to use.
Now, I have learned DNS failover is not an ideal solution, as others have pointed out, mainly because of unpredictable DNS caching. See for example: Why is DNS failover not recommended?.
Other than ELBs, it seems to me that most AWS HA architectures rely on DNS failover using route 53.
Finally, the floating IP/Elastic IP (EIP) strategy has popped up in a very small number of articles, such as Leveraging Multiple IP Addresses for Virtual IP Address Fail-over and I'm having a hard time figuring out if this is a viable solution for production systems. Also, all examples I came across implemented this using a set of active-passive instances. It seems like a waste to have a passive for every active to achieve this.
In light of this, I would like to ask you what is a faster and more reliable way to perform failover?
More specifically, please discuss how to perform failover without using DNS for the following 2 setups:
2 active-active EC2 instances in seperate AZs. Active-active, because this is a budget setup, were we can't afford to have an instance sitting around.
1 ELB with 2 EC2 instances in region A, 1 ELB with 2 EC2 instances in region B. Again, both regions are active and serving traffic. How do you handle the failover from 1 ELB to the other?