4
votes

I had seen zookeeper source code , It had very strange operation at cluster . We all known when write to zookeeper cluster nodes , the process steps are :

  1. Leader send proposal request to all follower and self
  2. When follower receive the proposal request , then ack it
  3. When leader receive the most the ack response , then send commit requst
  4. The follower and the leader commit it

The problem is the step 2 , when follower receive the proposal request , the requst is synced to zk tx log (See the list code) , the commit request only write to memory . But at before ack and after sync to disk time , restart all the node , was the uncommited request is the newest request ?

 //  the follower receive the proposal request method , forword to syncProcessor
 public void FollowerZooKeeperServer#logRequest(TxnHeader hdr, Record txn) {
            Request request = new Request(null, hdr.getClientId(), hdr.getCxid(),
                    hdr.getType(), null, null);
            request.hdr = hdr;
            request.txn = txn;
            request.zxid = hdr.getZxid();
            if ((request.zxid & 0xffffffffL) != 0) {
                pendingTxns.add(request);
            }
            syncProcessor.processRequest(request);
        }



 // the SyncRequestProcessor operation , after tx log commit to disk , it response the ack request. Was it ok ?
private void flush(LinkedList<Request> toFlush)
        throws IOException, RequestProcessorException
    {
        if (toFlush.isEmpty())
            return;

        zks.getZKDatabase().commit();
        while (!toFlush.isEmpty()) {
            Request i = toFlush.remove();
            if (nextProcessor != null) {
                nextProcessor.processRequest(i);
            }
        }
        if (nextProcessor != null && nextProcessor instanceof Flushable) {
            ((Flushable)nextProcessor).flush();
        }
    }
1
Is this machine translated?Sebastian

1 Answers

0
votes

The answer is yes.

After the tx log commit to disk, assume the txid of which is N, the write is considered accepted. If restarted all the nodes, the txid every node use will be N. After the election succeed, N will be considered committed.

I think what bother you is that the client actually not received the response of the write, but the write is committed. This is acceptable as the status of zookeeper cluster remains consistent, just the client encountered a timeout, where timeout means that the request is unknown whether committed or not.