3
votes

I am new to the Distributed System and Consensus Algorithm. I understand how it works but I am confused by some corner cases: when the acceptors received an ACCEPT for an instance but never heard back about what the final consensus or decision is, what will the acceptors react. For example, the proposer is robooting or failed during commit or right after it sends all the ACCEPT. What will happen in this case?

Thanks.

2

2 Answers

4
votes

There are two parts to this question: How do the acceptors react to new proposals? and How do acceptors react if they never learn the result?

In plain-old paxos, the acceptors never actually need to know the result. In fact it is perfectly reasonable that different acceptors have different values in their memory, never knowing if the value they have is the committed value.

The real point of paxos is to deal with the first question. And seeing that the acceptor never actually knows if it has the committed value, it has to assume that it could have the committed but be open to replacing its value if it doesn't have the committed value. How does it know? When receiving a message the proposer always compares the round number and if that is old then the acceptor signals to the proposer that it has to "catch up" first (a Nack). Otherwise, it trusts that the proposer knows what it is doing.


Now for a word about real systems. Some real paxos systems can get away with the acceptors not caring what the committed value is: Paxos is just there to choose what the value will be. But many real systems use Paxos & Friends to make redundant copies of the data for safekeeping.

Some paxos systems will continue paxos-ing until all the acceptors have the data. (Notice that without interference from other proposers, an extra paxos round copies the committed value everywhere.) Others systems are wary about interference from other proposers and will use a different Committed message that teach the acceptors (and other Learners) what the committed value is.

But what happens if the proposer crashes? A subsequent proposer can come along and propose a no-op value. If the subsequent proposer Prepares (Phase 1A) and can communicate with ANY of the acceptors that the prior proposer successfully sent Accepts to (Phase 2A) then it will know what the prior proposer was trying to do (via the response in Phase 1B: PrepareAck). Otherwise a harmless no-op value gets committed.

2
votes

when the acceptors received an ACCEPT for an instance but never heard back about what the final consensus or decision is, [how] will the acceptors react.

The node sending the value usually learns its value is fixed by counting positive responses to its ACCEPT messages until it sees a majority. If messages are dropped they can be resent until enough messages get through to determine a majority outcome. The acceptors don't have to do anything but accurately follow the algorithm when repeated messages are sent.

For example, the proposer is robooting or failed during commit or right after it sends all the ACCEPT. What will happen in this case?

Indeed this is an interesting case. A value might be accepted by a majority and so fixed but no-one knows as all scheduled messages have failed to arrive.

The responses to PREPARE messages have the information about the values already accepted. So any node can issue PREPARE messages and learn if a value has been fixed. That is actually the genius of Paxos. Once a value is accepted by a majority if is fixed because any node running the algorithm must keep choosing the same value under all message loss and crash scenarios.

Typically Paxos uses a stable leader who streams ACCEPT messages for successive rounds with successive values. If the leader crashes any node can timeout and attempt to lead by sending PREPARE messages. Multiple nodes issuing PREPARE messages trying to lead can interrupt each other giving live-lock. Yet they can never disagree about what value is fixed once it is fixed. They can only compete to get their own value fixed until enough messages get through to have a winner.

Once again acceptor nodes don’t have to do anything other than follow the algorithm when a new leader takes over from a crashed leader. The invariants of the algorithm mean that no leader will contradict any previous leader as to the fixed value. New leaders collaborate with old leaders and acceptors can simply trust that this is the case. Eventually enough messages will get through for all nodes to learn the result.