1
votes

When implementing the Raft algorithm, I found there is a situation that I think may or may not do harm to the cluster.

It is reasonable to assume some AppendEntriesRPC from Leader are received reordered(network delay or other reasons). Consider the Leader send a heartbeat AppendEntriesRPC to peer A, with prev_log_index = 1, and then send another AppendEntriesRPC with entry 2, and then it crash(I ensure this happen immediately by a callback in my test). If the two RPCs are handled in the order which they are sent, entry 2 will be inserted successfully. However, if the heartbeat RPC is delayed, then peer A will firstly insert entry 1 and respond to the Leader. Then comes the delayed heartbeat, peer A will erase entry 2, because the entry conflict with the Leader's prev_log_index = 1. So peer A erases a log entry by mistake.

To dig a little deeper, if the Leader doesn't crash immediately, will it fix this? I think if peer A respond to the delayed heartbeat correctly, the Leader will find out and fix it up in some later RPCs.

However, what if peer A's response to entry 2 lead to the commit_index advancing? In this case peer A vote to advance commit_index to 2, even though it actually does not have entry 2. So there may not enough votes for this advancing. When the Leader crashs now, a node with less logs will be elected as Leader. And I do encounter such situation during my testing.

My question is:

  1. Is my reasoning correct?
  2. If reordered RPC a real problem, how should I solve that? Is indexing and caching all RPCs, and force them be handled one by one a good solution? I found it hard to implement in gRPC.
1

1 Answers

0
votes

Raft assumes an ordered stream protocol such as TCP. That is, if a message arrives out of order then it is buffered until its predecessor arrives. (This behavior is why TCP exists: because each individual packet can go through separate routes between servers and there is a high chance of out-of-order messages, and most applications prefer the ease-of-mind of a strict ordering.)

Other protocols, such as plain old Paxos, can work with out-of-order messages, but are typically much slower than Raft.