1
votes

I have read the Raft algorithm paper's and got a question related to the sequence of operations Raft executes upon receiving a client request:

In order to overcome a single point of failure scenario, Raft relies on maintaining a replicated log on other machines, the algorithm also consults a consensus module for the complete logging management. The sequence of operations work as follow:

  1. Client request is received at the leader's state machine, leader appends command to its log.
  2. The leader sends AppendEntries RPCs to his followers to clone the command in their local logs', and waits for an acknowledgment from majority of the followers that the entry has been successfully appended to their local log file.
  3. Once an acknowledgment has been received that the request has been successfully logged in majority of the followers logs', then the request is committed to the leader's state machine causing a transition to happen, returning back the output of that transition to the client.
  4. Ultimately, the leader notifies followers of committed entries in subsequent AppendEntries RPCs.

If above understanding is correct, then I can claim that the client request is being held for a bit of time for the replication process to complete, also I may also claim that the success of a client request is heavily dependent on the success of the replication process (since the client command / request is not executed on the leader's machine until a majority acknowledgment has been received). The question is, how long it is expected to take on average for a client request to receive a response after the replication procedure is completed, also does that work efficiently for real-time systems?

2

2 Answers

2
votes

http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed suggests that systems such as Raft requesting the Consistency and Availability parts of the CAP theorem's trinity will suffer performance limits. You may also be interested in https://pdfs.semanticscholar.org/7c45/54d064128043897ea2226021f6fda4c64251.pdf (A review of experiences with reliable multicast, by Birman), which describes experience with reliable multicast groups in high assurance systems such as air traffic control.

My takeaway from this is that a real system may want to be very careful about what information it guards with Raft, Paxos, and friends, and what it can guard less tightly. The other point of view is to go for a very sophisticated implementation of Paxos, such as Google Spanner, so that programmers don't have to worry about the problems of non-ACID systems.

0
votes

If above understanding is correct, then I can claim that the client request is being held for a bit of time for the replication process to complete

Correct, the leader of the current term will acknowledge a client request only after the command has been replicated to majority of nodes in the cluster.

I may also claim that the success of a client request is heavily dependent on the success of the replication process

That's also correct. At least of majority of nodes in the cluster (including the leader) need to be available and responsive, in order for the command to be replicated successfully and the leader to acknowledge the request.

how long it is expected to take on average for a client request to receive a response after the replication procedure is completed

That depends on the topology of your network. The latency of the response to a client request will be composed of the following parts (assuming no leader crashes): * the latency required for the client request to be transmitted between the client and the leader. * the latency of an AppendEntries request from the leader to followers to replicate the entry (sent in parallel to all the followers. * the latency of an AppendEntries response from the followers to the leader. * the time required by the leader to apply the command to its state machine (i.e. a disk write in the best case) * the latency of the client response to be transmitted from the leader to the client

The latency of the various messages depends on the distance between nodes, but it will probably be in the order of tenths to hundreds of milliseconds.

also does that work efficiently for real-time systems?

It depends on what are your requirements for your specific case. But in general, real-time systems require latencies that are under a few milliseconds, so the answer is most likely no. Also, keep in mind that during periods of crashes and instability where new leader elections happen, latency can increase significantly.