I dont see a huge advantage of RAFT in implementing distributed DBs. If the clients can only write to the leader then that leader still becomes the choke point -- or the single point of failure. Ideally, I would want a way in which multiple clients can write to multiple DBs and then the DBs synchronize amongst themselves. That would help scale up things because no single DB acts as the choke point. Is there a way to do that?
2 Answers
Appache Kafka has brought in KRaft mode that avoids the dependency on Zookeeper as an improvement.
Even in the case of Docker, in order to manage the global cluster state, the Raft Consensus Algorithm is implemented by manager nodes while the Docker Engine runs in swarm mode.
There are many other DBs that use raft algorithm. HashiCorp already use it.
Typically, if the cluster's leader crashes due to some reason or does not have a leader, a random node based on the election process shall become a candidate one the election timeout gets over.
Raft is a distributed consensus protocol that meets a key requirement of achieving consensus in distributed systems . This shall involve multiple servers agreeing, deciding on values. In simple terms, it will be like they would need majority of servers to be available for deciding. This in turn operates by electing a leader in the cluster.
In case of Push model the leader takes the responsibility to keep track of replication process in the cluster. You can also make changes in code such that you have Pull model used where the follower takes care of its own replication.
In many raft consensus algorithm, the leader of cluster / the controller is responsible for managing the cluster health, while in your question you suggested that it introducing a bottleneck, it is just not true , as there could be a "leader" to each data segment inside the data store which the client will work against it. If we are taking for example about kafka, the clients are working against the leader of the topic's partition , which could be any broker inside the cluster , and the data is replicated to the in sync followers, it is not related to the leader/controller of the cluster