Paxos phase 2a message loss

Question

The figure below is the message flow of basic Paxos, in the phase 2a, the Leader chooses value Vn for its proposal1 and sends Accept!(1,Vn) to every acceptor. My question is: what if two of theses three messages get lost? I mean only Acceptor 1 (not majority) receives Accept!(1,Vn). Will Acceptor 1 accept this request? And then broadcast to every learner? This value is chosen?

One should avoid accepting a first incorrect answer just because it was the answer you expected. — simbo1905
JensG's answer is correct. The basic Paxos algorithm does not have a concept of negative responses so the lack of a response is interpreted as a negative acknowledgement. Simbo1905's answer is also correct but references practical extensions beyond what the core algorithm requires. The advantage of the pure algorithm is that it avoids most of the complexity involved in real-world implementations. Efficiently detecting and handling these failure cases in practical implementations can be quite complex & challenging. So, it's probably best to ignore those details when first learning Paxos. — Rakis
@Rakis plus one to your well considered comment. i have edited my answer to make it clear that it is to assist in building real solutions with paxos rather than passing a cs exam on the subject. in my experiences of implementing paxos i find that the academic approach of proving the correctness using synthetic single round protocol which isn't practical is probably the main reason that people find it confusing and complex. so i will leave my practical help answer even if it won't help anyone pass a university written exam. — simbo1905

simbo1905 simbo1905 · Accepted Answer · 2015-12-15T20:04:01

Paxos is tolerant to repeated messages so it is safe, and optimal, for the proposer to resend either prepare or accept messages if it does not get a majority response. Say there was a transient and short network issue such that only those messages were lost. If it times-out on the responses, and then resends the messages, then the messages will get through, and the round can succeed. That is more efficient than failing the round and starting a new round due to lost messages.

Edit I should note that as @Rakis points out in a comment I am actually speaking about practical applications of Paxos rather than the academic description of the algorithm which is designed to prove the correctness of the approach. If one is sitting an exam on the subject stick to the academic description. If you are actually writing an implementation then timing out and resending is the way to handle lost messages efficiently due to transient message loss.

Edit An answer "the round is failed" implies that there is a timeout to act on that fact by starting a new round. Timeouts are not actually mentioned in the paper Paxos Made Simple. So if one wants to stick to the academic description of the algorithm then the exact answer to message loss is "nothing happens as the world stops". Note that the paper says it makes no claim to things happening in a timely manner only that no incorrect outcome can occur. Clearly no reasonable implementation will just do nothing; it should timeout and do something. This highlights the fact that the academic description of the algorithm is only for a proof of correctness; it deliberately leaves out practical considerations of how to actually build an actual solution using the algorithm. The paper hints that you can add things like negative responses to help build a pragmatic implementation without effecting correctness. So pragmatic solutions like trex return negative acknowledgements to speed up failover recovery using the algorithm. Then not getting a response is not the same as getting negatives responses; so with messages loss the round is not failed it has an undetermined outcome. Further messages (resends) can be sent safely to determine the actual outcome.

Paxos phase 2a message loss

2 Answers