The figure below is the message flow of basic Paxos, in the phase 2a, the Leader chooses value Vn for its proposal1 and sends Accept!(1,Vn) to every acceptor. My question is: what if two of theses three messages get lost? I mean only Acceptor 1 (not majority) receives Accept!(1,Vn). Will Acceptor 1 accept this request? And then broadcast to every learner? This value is chosen?
2 Answers
Paxos is tolerant to repeated messages so it is safe, and optimal, for the proposer to resend either prepare or accept messages if it does not get a majority response. Say there was a transient and short network issue such that only those messages were lost. If it times-out on the responses, and then resends the messages, then the messages will get through, and the round can succeed. That is more efficient than failing the round and starting a new round due to lost messages.
Edit I should note that as @Rakis points out in a comment I am actually speaking about practical applications of Paxos rather than the academic description of the algorithm which is designed to prove the correctness of the approach. If one is sitting an exam on the subject stick to the academic description. If you are actually writing an implementation then timing out and resending is the way to handle lost messages efficiently due to transient message loss.
Edit An answer "the round is failed" implies that there is a timeout to act on that fact by starting a new round. Timeouts are not actually mentioned in the paper Paxos Made Simple. So if one wants to stick to the academic description of the algorithm then the exact answer to message loss is "nothing happens as the world stops". Note that the paper says it makes no claim to things happening in a timely manner only that no incorrect outcome can occur. Clearly no reasonable implementation will just do nothing; it should timeout and do something. This highlights the fact that the academic description of the algorithm is only for a proof of correctness; it deliberately leaves out practical considerations of how to actually build an actual solution using the algorithm. The paper hints that you can add things like negative responses to help build a pragmatic implementation without effecting correctness. So pragmatic solutions like trex return negative acknowledgements to speed up failover recovery using the algorithm. Then not getting a response is not the same as getting negatives responses; so with messages loss the round is not failed it has an undetermined outcome. Further messages (resends) can be sent safely to determine the actual outcome.