- I saw, it was similar question at Zoopekeeper instances in Kafka. But the question remained unanswered.
So my extended version of question (with more details)
- Environment: There are 3 nodes of business application. Each application contains its own 1 zookeeper and 1 kafka embedded nodes aboard.
Preventing the occurrence of bewildered questions I must clarify. My business application built on top of elasticsearch with 3 nodes with minimumMasterNodes=2, so fault tolerance of my applicaton in cluster is 1. So I assume, that in the same way I can put to each application its own instance of zookeeper node and kafka node. General goal is to build on top of this stack the inter-datacenter data replication for business app using kafka mirrormaker with fault tolerance=1.
In my experiments I didn't use full stack of my business app, but only zookeeper+kafka inside each app node. Each app outputs its log into console, so I could determine, which one has started zookeeper in LEADER mode.
My zookeeper ansemble configuration is:
server.1=localhost:2668:3668
server.2=localhost:2669:3669
server.3=localhost:2670:3670
syncLimit=5
initLimit=10
clientPort=* #here each node has its own value of port number: 2182,2183,2184 for servers 1,2,3 accordingly
dataDir=D:\rtest\3-nodes\data\*\zoo # * is 1, 2, 3 accordingly to servers 1,2,3
dataLogDir=D:\rtest\3-nodes\data\*\zoo\log # * is 1, 2, 3 accordingly to servers 1,2,3
- My fault scenario is: 2.1. Start all three app nodes. Start consumer (console output). Start application for producing the sequence of messages. Make sure that consumer is receiving messages via kafka cluster. 2.2. Kill app whose instance of zookeeper is leader (in my case it was server #3). 2.3. Ensure that consumer does not output any new message from kafka topic.
From my point of view, the problem lies in the zookeeper. Here are the excerpts of logs that were produced by alive nodes 1, 2: It looks like live zookeeper servers are continue trying to reach dropped server instead of get agreement about quorum among themselves... By the way. In such circumstances I even cannot connect to zookeeper by console clisent (be more clear, i can connect to it, but at first command, shall we say "ls /" console client falls down with exception )
Server 1:
15459 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2182] WARN org.apache.zookeeper.server.quorum.Learner - Exception when following the leader
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.BufferedInputStream.fill(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at java.io.DataInputStream.readInt(Unknown Source)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
15460 [Thread-3-SendThread(127.0.0.1:2184)] WARN org.apache.zookeeper.ClientCnxn - Session 0x354b9dbe0b90001 for server 127.0.0.1/127.0.0.1:2184, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: An existing connection was forcibly closed by the remote host
at sun.nio.ch.SocketDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(Unknown Source)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
at sun.nio.ch.IOUtil.read(Unknown Source)
at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
15459 [Thread-3-SendThread(0:0:0:0:0:0:0:1:2184)] WARN org.apache.zookeeper.ClientCnxn - Session 0x354b9dbe0b90000 for server 0:0:0:0:0:0:0:1/0:0:0:0:0:0:0:1:2184, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: An existing connection was forcibly closed by the remote host
at sun.nio.ch.SocketDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(Unknown Source)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
at sun.nio.ch.IOUtil.read(Unknown Source)
at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
15459 [RecvWorker:3] WARN org.apache.zookeeper.server.quorum.QuorumCnxManager - Connection broken for id 3, my id = 1, error = java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.DataInputStream.readInt(Unknown Source)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:765)
15462 [RecvWorker:3] WARN org.apache.zookeeper.server.quorum.QuorumCnxManager - Interrupting SendWorker
15462 [SendWorker:3] WARN org.apache.zookeeper.server.quorum.QuorumCnxManager - Interrupted while waiting for message on queue java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(Unknown Source)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source)
at java.util.concurrent.ArrayBlockingQueue.poll(Unknown Source)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:849)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:64)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:685)
15462 [SendWorker:3] WARN org.apache.zookeeper.server.quorum.QuorumCnxManager - Send worker leaving thread
15766 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2182] WARN org.apache.zookeeper.server.NIOServerCnxn - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
16481 [WorkerSender[myid=1]] WARN org.apache.zookeeper.server.quorum.QuorumCnxManager - Cannot open channel to 3 at election address localhost/127.0.0.1:3670
java.net.ConnectException: Connection refused: connect
at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
at java.net.PlainSocketImpl.connect(Unknown Source)
at java.net.SocksSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430)
at java.lang.Thread.run(Unknown Source)
16596 [Thread-3-SendThread(127.0.0.1:2184)] WARN org.apache.zookeeper.ClientCnxn - Session 0x354b9dbe0b90000 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
...
Server2:
...
5118 [RecvWorker:3] WARN org.apache.zookeeper.server.quorum.QuorumCnxManager - Connection broken for id 3, my id = 2, error =
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.DataInputStream.readInt(Unknown Source)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:765)
5121 [RecvWorker:3] WARN org.apache.zookeeper.server.quorum.QuorumCnxManager - Interrupting SendWorker
5120 [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2183] WARN org.apache.zookeeper.server.quorum.Learner - Exception when following the leader
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.BufferedInputStream.fill(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at java.io.DataInputStream.readInt(Unknown Source)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
5119 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183] WARN org.apache.zookeeper.server.NIOServerCnxn - Exception causing close of session 0x254b9dbe0b20000 due to java.io.IOException: An existing connect
ion was forcibly closed by the remote host
5122 [SendWorker:3] WARN org.apache.zookeeper.server.quorum.QuorumCnxManager - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(Unknown Source)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source)
at java.util.concurrent.ArrayBlockingQueue.poll(Unknown Source)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:849)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:64)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:685)
5123 [SendWorker:3] WARN org.apache.zookeeper.server.quorum.QuorumCnxManager - Send worker leaving thread
5536 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183] WARN org.apache.zookeeper.server.NIOServerCnxn - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
6143 [WorkerSender[myid=2]] WARN org.apache.zookeeper.server.quorum.QuorumCnxManager - Cannot open channel to 3 at election address localhost/127.0.0.1:3670
java.net.ConnectException: Connection refused: connect
at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
at java.net.PlainSocketImpl.connect(Unknown Source)
at java.net.SocksSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430)
at java.lang.Thread.run(Unknown Source)
....
By the way. The ansamble of 4 such nodes works perfect due to my requirements. So can everybody answer, if zookeeper cluster of 3 nodes can survive after dying one node? Or am I doing something wrong?