How does the Raft algorithm guarantee consensus if there are multiple leaders?

12

votes

As the paper says:

Election Safety: at most one leader can be elected in a given term. §5.2

However, there may be more than one leader in the system. Raft only can promise that there is only one leader in a given term. So If I have more than one client, wouldn't I get different data? How does this allow Raft to be a consensus algorithm?

Is there something I don't understand here, that someone could explain?

algorithmdistributedconsensusraft

I'm curious why this was downvoted: downvoter, why did you downvote? This is a very good question, one that I had as I was reading the Raft paper myself. (And don't fear: the paper does explain.) – Thanatos

this is the question I asked three years ago. Right now, I can answer the question myself. – baotiao

The key point here is that even the read operation, the client need to go through the raft protocol. If the client request the old leader, the old leader launch AppendEntry to confirm that whether it is the newest leader. It will notice that it is the old leader or the AppendEntry is timeout, then it will return to client timeout or error. – baotiao

@baotiao feel free to convert that into a self answer – ggorlen

4

votes

Only a candidate node which has a majority of votes can lead. Only one majority exists in cluster the other node cannot hear from a majority without contacting at least one node which has already voted for another leader. The candidate who hears of the other leader will step down. Here is a nice animation which shows how it happens: http://thesecretlivesofdata.com/raft/#election

1

votes

Yes you are right. There can be multiple leaders at the same time, but not in the same term, so the guarantee still holds. A possible situation is in a 3-server (A, B, C) cluster, A becomes elected. And then a network partition happens and the cluster is separated into 2 partitions: {A} and {B, C}. In this case, A would not step down as it does not receive any RPC with a higher term and remains a leader. In the majority partition, a new leader can still be elected. But notice that this new leader is in a greater term than A.

Then how about the request from the client? Two cases.
1. For a WRITE request, the leader cannot reply to the client unless the entry log committed, which is impossible for the outdated leader. So no problem. Only the true leader would be able to commit the entry by replicating it on a majority of servers.
2. For a READ-ONLY request, the leader can get away without consulting the log or committing the entry. You are right and this is explicitly mentioned in the paper at the end of section 8.

Read-only operations can be handled without writing anything into the log. However, with no additional measures, this would run the risk of returning stale data, since the leader responding to the request might have been superseded by a newer leader of which it is unaware. Linearizable reads must not return stale data, and Raft needs two extra precautions to guarantee this without using the log. First, a leader must have the latest information on which entries are committed. The Leader Completeness Property guarantees that a leader has all committed entries, but at the start of its term, it may not know which those are. To ﬁnd out, it needs to commit an entry from its term. Raft handles this by having each leader commit a blank no-op entry into the log at the start of its term. Second, a leader must check whether it has been deposed before processing a read-only request (its information may be stale if a more recent leader has been elected). Raft handles this by having the leader exchange heartbeat messages with a majority of the cluster before responding to read-only requests.

Hope this helps.

0

votes

Every machine in the cluster compares its current term against the term it recieves along with all the requests it gets from the other machines. And whenever a "leader" tries to act as a leader, it will not get a majority accepts from the rest of the cluster since the majority of the machines have greater term then the "leader". That guarantees that only the actual leader will be able to reply on clients requests.
Additionally, according to Raft, this "leader" will become a follower immediately after it recieves a reject with a greater term.

0

votes

this is the question I asked three years ago. Right now, I can answer the question myself.

The key point here is that even the read operation, the client need to go through the raft protocol. If the client request the old leader, the old leader launch AppendEntry to confirm that whether it is the newest leader. It will notice that it is the old leader or the AppendEntry is timeout, then it will return to client timeout or error.

How does the Raft algorithm guarantee consensus if there are multiple leaders?

4 Answers