Single-Region Spanner is advertised with a 99.99% availability SLA. In the US-based configuration, there will be exactly three replicas per node, all in Council Bluffs, Iowa. Can you share information that breaks down why the 99.99% (~one hour of downtime per year) is believable, especially in the case of geographically-local disasters? I assume that Google has done a thorough analysis, or else it would not advertise the SLA, but I cannot find a detailed paper.
In the event of a regional failure, what recovery procedures will Google carry out and with what recovery time / expected data loss?
(I understand that multi-region may be available, and have seen some pricing data, but will not discuss this here).