2
votes

In the Consistency Guarantees section of ZooKeeper Programmer's Guide, it states that ZooKeeper will give "Single System Image" guarantees:

A client will see the same view of the service regardless of the server that it connects to.

According to the ZAB protocol, only if more than half of the followers acknowledge a proposal, the leader could commit the transaction. So it's likely that not all the followers are in the same status.

If the followers are not in the same status, how could ZooKeeper guarantees "Single System Status"?


References:

3

3 Answers

3
votes

Leader only waits for responses from a quorum of the followers to acknowledge to commit a transaction. That doesn't mean that some of the followers need not acknowledge the transaction or can "say no".

Eventually as the rest of the followers process the commit message from leader or as part of the synchronization, will have the same state as the master (with some delay). (not to be confused with Eventual consistency)

How delayed can the follower's state be depends on the configuration items syncLimit & tickTime (https://zookeeper.apache.org/doc/current/zookeeperAdmin.html)

A follower can at most be behind by syncLimit * tickTime time units before it gets dropped.

2
votes

The document is a little misleading, I have made a pr.

see https://github.com/apache/zookeeper/pull/931.

In fact, zookeeper client keeps a zxid, so it will not connect to older follower if it has read some data from a newer server.

1
votes

All reads and writes go to a majority of the nodes before being considered successful, so there's no way for a read following a write to not know about that previous write. At least one node knows about it. (Otherwise n/2+1 + n/2+1 > n, which is false.) It doesn't matter if many (at most all but one) has an outdated view of the world since at least one of them knows it all.

If enough nodes crash or the network becomes partitioned so that no group of nodes that are able to talk to each other are in a majority, Zab stops handling requests. If your acknowledged update gets accepted by a set of nodes that disappear and never come back online, your cluster will lose some data (but only when you ask it to move on, and leave its dead nodes behind).

Handling more than two requests is done by handling them two at a time, until there's only one state left.