I'm learning Raft from the paper's extended version. In section 5.2 (Leader Election) of the paper, it says:
If a follower receives no communication over a period of time called the election timeout, then it assumes there is no viable leader and begins an election to choose a new leader.
At the same time, the paper says in some cases an RPC can be rejected, for example when it contains a smaller term number.
My question is: when should a follower recognize an RPC as a valid "communication" and record it to prevent itself from timing out?
Edit:
My current implementation is as follows:
RequestVote
resets the timeout only when the server grants voteAppendEntries
resets the timeout if its term is no smaller than the server's
This works fine in most cases, but sometimes causes a long election. Consider a Raft cluster with 2 servers, both followers. Server #1 has a more up-to-date log, but server #2 has a larger term.
In this setting, server #1 has to continuously start 2 elections to become a leader, which (intuitively) happens with <50% probability. If server #2 starts an election and timeouts, its term increases and the next election by server #1 will fail again. In practice this can cause the whole election to last for several seconds even if there are only a few servers. I wonder if there are some approaches to solve this problem (or if this is in fact not a problem).