2
votes

In case of network partitions, Raft stays consistent. But what does happen if only a single node loses contact only to the leader, becomes a candidate and calls for votes?

This is the setup, I adjusted the examples from http://thesecretlivesofdata.com/raft/ to fit my needs:

enter image description here

Node B is the current leader and sends out heartbeats (red) to the followers. The connection between B and C gets lost and after the election timeout C becomes a candidate, votes for itself and asks nodes A, D and E to vote for it (green).

What does happen?

As far as I understand Raft, nodes A, D and E should vote for C which makes C the next leader (Term 2). We then have two leaders each sending out heartbeats, and hopefully nodes A, D and E will ignore those from B because of the lower term.

Is this correct or is there some better mechanism?

2

2 Answers

1
votes

After going through the Raft Paper again, it seems that my above approach was correct. From the paper:

Terms act as a logical clock in Raft, and they allow servers to detect obsolete information such as stale leaders. Each server stores a current term number, which increases monotonically over time. Current terms are exchanged whenever servers communicate; if one server’s current term is smaller than the other’s, then it updates its current term to the larger value. If a candidate or leader discovers that its term is out of date, it immediately reverts to follower state. If a server receives a request with a stale term number, it rejects the request

The highlighted part is the one I was missing above. So the process is:

  • After node C has become candidate, it increases its term-number to 2 and requests votes from the reachable nodes (A, D and E).
  • Those will immediately update their current_term variable to 2 and vote for C.
  • Thus, nodes A, D and E will ignore heartbeats from B and moreover tell B that the current term is 2.
  • B will return into follower state (and won't get updated until the network connection between C and B is healed).
0
votes

Since A, B, D keeps health heartbeat to the the leader B (Term 1 ), they would not responds to the vote request from C ( Term 2 ), C will timeout and repeat vote and repeat timeout.

As the Figure 4 from the raft paper https://raft.github.io/raft.pdf enter image description here