How does Zookeeper manage node roles in other clusters?

Question

My understanding is that Zookeeper is often used to solve the problem of "keeping track of which node plays a particular role" in a distributed system (e.g. master node in a DB or in a MapReduce cluster, etc).

For simplicity, say we have a DB with one master and multiple replicas and the current master node in the DB goes down. In this scenario, one would, in principle, make one of the replica nodes a new master node. At this point my understanding is:

If we didn't have Zookeeper

The application servers may not know that we have a new master node, so they would not know where to send writes unless we have some custom logic on the app server itself to detect / correct this problem.

If we have Zookeeper

Zookeeper would somehow detect this failure, and update the value for the corresponding master key. Moreover, application servers can (optionally?) register hooks in Zookeeper, so Zookeeper can notify them of this failure, so that the app servers can update (e.g. in memory), which DB node is the new master.

My questions are:

How does Zookeper know what node to make master? Is Zookeper responsible for this choice?
How is this information propagated to nodes that need to interact with Zookeeper? E.g. If one of the Zookeeper nodes go down, how would the application servers know which Zookeeper node to hit in this scenario? Does Zookeeper manage this differently from competing solutions like e.g. etcd?

pebox11 pebox11 · Accepted Answer · 2020-07-04T19:55:42

The answer to both 1. and 2. is called leader election process and briefly works in the following way:

When a process starts in a cluster managed by ZK, the cluster enters an election state. If there is a leader then there exists an established hierarcy and the existing leader is just verified. If there is no leader (say master is down), ZK forces the znodes to use sequence flags to look for a new leader. Each node talks to its peers and sends a message containing the node's identifier (sid) and the most recent transaction it executed (zxid). These messages are called votes. When a node receives a vote it can either neglect it or keep it depending the zxid. If zxid is newer it keeps the vote if older than what it has it discards it. If there is a tie in zxids then the vote with the highest sid wins! So there will come a time when all nodes will have the same vote which will define the new leader by the sid. So this is how ZK elects a new leader node!

How does Zookeeper manage node roles in other clusters?

If we didn't have Zookeeper

If we have Zookeeper

1 Answers