Recently I'm considering to use Amazon RDS Multi-AZ deployment for a service in production environment, and I've read the related documents.
However, I have a question about the failover. In the FAQ of Amazon RDS, failover is described as follows:
Q: What happens during Multi-AZ failover and how long does it take?
Failover is automatically handled by Amazon RDS so that you can resume database operations as quickly as possible without administrative intervention. When failing over, Amazon RDS simply flips the canonical name record (CNAME) for your DB Instance to point at the standby, which is in turn promoted to become the new primary. We encourage you to follow best practices and implement database connection retry at the application layer. Failover times are a function of the time it takes crash recovery to complete. Start-to-finish, failover typically completes within three minutes.
From the above description, I guess there must be a monitoring service which could detect failure of primary instance and do the flipping.
My question is, which AZ does this monitoring service host in? There are 3 possibilities: 1. Same AZ as the primary 2. Same AZ as the standby 3. Another AZ
Apparently 1&2 won't be the case, since it could not handle the situation that entire AZ being unavailable. So, if 3 is the case, what if the AZ of the monitoring service goes down? Is there another service to monitor this monitoring service? It seems to be an endless domino.
So, how is Amazon ensuring the availability of RDS in Multi-AZ deployment?