0
votes

I've been reading the paper "In Search of an Understandable Consensus Algorithm". I'm confused with how "term" works.

I have two thoughts.

  1. A term begins with an election, and ends with the next election. The next election may happen due to the crash of the current leader. As long as the current leader works perfectly, the term could be lasting for a very long time.

  2. A term's end is determined when it begins. For example, after a server wins the election, the term begins and plans to end in 30 minutes. Then after 30 minutes, the leader stops sending heartbeats to cause another election.

So which one is correct? I feel like the first thought makes more sense and it provides better performance.

1

1 Answers

0
votes

Either option would work, but your first option is preferable. If you stop sending heartbeats then you likely have to wait for quite some time (a few seconds perhaps) before the new master is elected. You can in theory avoid this wait and trigger an election immediately but elections are always slightly disruptive so one normally design systems to avoid them as much as possible.

The only time an election is really needed is if something has gone wrong: for instance a communication breakdown or some nodes have failed. In practice clusters may run for a very long time (weeks? years?) without a failure, so they do not need more frequent elections.

Also note that terms so not really have a well-defined (global) beginning and end because of the asynchronous nature of communication and the difficulty of pinning down a notion of time in a distributed system. A node may believe a term is still ongoing even though the other nodes all believe it either hasn't started or has finished.