0
votes

I'm learning Raft from the paper's extended version. In section 5.2 (Leader Election) of the paper, it says:

If a follower receives no communication over a period of time called the election timeout, then it assumes there is no viable leader and begins an election to choose a new leader.

At the same time, the paper says in some cases an RPC can be rejected, for example when it contains a smaller term number.

My question is: when should a follower recognize an RPC as a valid "communication" and record it to prevent itself from timing out?


Edit:

My current implementation is as follows:

  • RequestVote resets the timeout only when the server grants vote
  • AppendEntries resets the timeout if its term is no smaller than the server's

This works fine in most cases, but sometimes causes a long election. Consider a Raft cluster with 2 servers, both followers. Server #1 has a more up-to-date log, but server #2 has a larger term.

In this setting, server #1 has to continuously start 2 elections to become a leader, which (intuitively) happens with <50% probability. If server #2 starts an election and timeouts, its term increases and the next election by server #1 will fail again. In practice this can cause the whole election to last for several seconds even if there are only a few servers. I wonder if there are some approaches to solve this problem (or if this is in fact not a problem).

1

1 Answers

0
votes

A Raft node that is serving as a Follower responds to two types of requests:

  • AppendEntries from the Leader
  • RequestVote from a Candidate

If a Follower receives an AppendEntries from the current Leader, it should do all the checks (ie. term from the request, log matching) and if all the checks are satisfied, the Follower should append received entries from the request. The follower should also reset the election timeout when receiving AppendEntries from the current Leader because the AppendEntries also serves as a heartbeat (Leaders also send periodic AppendEntries requests with no logs in order to prevent Follower from timing out and starting a new election).

If a Follower receives a RequestVote RPC, and if the Follower decides to grant its vote to that Candidate, the Follower will also reset its election timeout.