What is an instance?
The nomenclature in Paxos is a little unintuitive.
- An instance is the algorithm for choosing one value.
- A round refers to a proposer's single attempt of a Phase 1 + Phase 2. A node can have multiple rounds in an instance of Paxos. A round id is globaly unique per instance across all nodes. This is sometimes called proposal number.
- A node may take on several roles; most notably Proposer and Acceptor. In my answers, I'll assume each node takes on both roles.
- Phase 1 is also known as the Prepare phase.
- In Phase 1a, a Proposer sends a Prepare!(roundId) message to the Acceptors
- In Phase 1b, the Acceptors reply with either Promise!(roundId, value) or PrepareNack!()
- Phase 2 is also known as the Accept phase.
- In Phase 2a, a Proposer sends an Accept!(roundId, value) message to the Acceptors
- In Phase 2b, the Acceptors reply with either Accepted!(...) or AcceptNack!()
Assuming a cluster divided in 3 regions, each containing 3 nodes (total = 9 nodes). What happens if communication is broken between regions? There is no way any leader can reach quorum (which is 5).
Paxos requires you can get at least a quorum (5 nodes in your case). Go with your solution of three regions; having two network partitions between the three regions is very bad news. I also use a version of Paxos which can change node membership from one instance to the next. This is useful for partitions and node failure.
Isn't Paxos going to enter an infinite loop?
A naive implementation of Paxos is not guaranteed to terminate because multiple nodes can leap-frog Prepare phases. There are two ways of getting around this. One is to have a random backoff before starting new Prepare phases. The second is to route all requests to a designated leader, that acts as proposer (The leader is chosen by a Paxos instance. See also Multi-paxos)
In Phase 1b: 'If the proposal number N is larger than any previous proposal, then each >>Acceptor promises not to accept proposals less than N, and sends the value it last accepted for >>this instance to the Proposer'.
What is 'the last value it accepted'? Is it any previous proposal number from the proposer?
When a node receives an Accept!(roundId, value) message from a Proposer and it hasn't promised to not accept the value (due to a Prepare!(higherRoundId) message), it stores the value and the roundId (I'll call them acceptedValue
and acceptedRoundId
). It may write over these due to subsequent Accept!(...) messages.
When a node receives a Prepare!(roundId) message from a Proposer, it stores roundId as promiseRoundId = max(roundId, promiseRoundId)
. It then sends a Promise!(acceptedRoundId, acceptedValue)
back to the Proposer. NB: if a node hasn't received an Accept!(...) message, it replies with Promise!(null, null)
.
In Phase 1a: Does one include the value to agree on with the Prepare message or is this deferred to the Accept! message? Or it does matter?
There is no need to send it. I don't.
In Phase 2a: 'If any of the Acceptors have already accepted a value, the leader must Choose a value with the maximum proposal number N'.
What is value here? Is it the proposal number? I believe not, but this phrase is unclear.
The value is the actual data the algorithm is reaching consensus on. I'll rephrase this to
To start the Accept Phase, The Proposer must choose a value to be accepted depending on the results of the Prepare phase. If any Acceptor replied with Promise(roundId, value), the Proposer must use the value associated with the highest roundId. Otherwise, the Proposer received only Promise(null, null), and may choose any value to send to the acceptors.
NB: Proposal number here is the same thing as roundId.
In Phase 2a: 'Otherwise, the Proposer is free to choose any value'. What does this mean? A value for what? For the proposal number?
This is the value you want to have consensus on. This is typically a state change across the distributed system, perhaps triggered by a client request.
Paxos seems to rely on an increasing value of N (proposal number) to work? Is this correct?
The wikipedia entry does not discuss the initial values a node should set before starting to participate in Paxos. What are these?
Round ids (aka proposal numbers) should be increasing and must be unique per instance across all nodes. The Paxos paper assumes you can do this because it is trivial to achieve. Here's one scheme that produces the same results on all nodes:
- Say there are M nodes participating in an instance of Paxos.
- Sort all the nodes lexicographically. index[node] is the index of a node in this sorted list.
roundId = i*M + index[node]
where i is the ith round this node is starting (that is i is unique per node per paxos instance, and is monotonically increasing).
Or in pseudo-code (which is clearly lacking a few major optimizations):
define runPaxos( allNodesThisPaxosInstance, myValue ) {
allNodesThisPaxosInstance.sort()
offset = allNodesThisPaxosInstance.indexOf( thisNode )
for (i = 0; true; i++) {
roundId = offset + i * allNodesThisPaxosInstance.size()
prepareResult = doPreparePhase( roundId )
if (!prepareResult.shouldContinue?)
return
if (prepareResult.hasAnyValue?)
chosenValue = prepareResult.valueWithHighestRoundId
else
chosenValue = myValue
acceptResult = doAcceptPhase( roundId, chosenValue )
if (!acceptResult.shouldContinue?)
return
}
}