0
votes

I have Kafka and Zookeeper co-located on the same servers, with multiple nodes.

In Kafka's server.properties, I have a line like

zookeeper.connect=server1:2181,server2:2181...

the problem is, Kafka will not start until all of the Zookeeper nodes are available. Otherwise, I will get an error like "fatal error during Kafka startup" and "Timed out waiting for connection while in state: CONNECTING" even though the other Zookeeper nodes are up.

This makes it challenging to script startup of each node independently, since the startup scripts on one node are dependent on the state of other nodes.

First: is this expected behavior or am I doing something wrong? Suppose I have 3 nodes in Zookeeper cluster; all 3 nodes have to be up for Kafka to start? That seems counterintuitive, since a larger cluster would actually increase the chance of failure on startup rather than provide more resiliency.

Second: What's a good solution for this? Is the only approach to make Kafka on each node wait until Zookeeper is fully up on all nodes?

2
And how many Zookeeper servers are there? How are they configured? One recommendation would be not to colocateOneCricketeer
I have a similar problem with 5 ZK nodes, if a single one of the ZK nodes goes down that a kakfa instance was connected to, the Kafka node will not choose one of the other 4 ZK nodes to connect to and continue operation. Kafka will just keep flapping trying to reconnect to the one down node (though it knows about all 5 in its configs). Is there some situation where kafka requires one of the five nodes to be up? Maybe a replication issue and the znodes it needs don't exist elsewhere?xref

2 Answers

0
votes

As far as I know, this is a prerequisite for Kafka to start up correctly, and I don't think too much of a burden. If the zookeeper cluster itself is already having problems at startup time, Kafka itself might run into problems, so ensuring that the Zookeeper cluster is healthy is a good initial check, IMHO.

A way to get around this limitation is to configure a single-node Zookeeper cluster, and tell Kafka to use that cluster. After the fact, you can grow the zookeeper cluster to 3 or more nodes, while Kafka is already up and running. More details can be found here: Adding new ZooKeeper node in Kafka cluster?

For the record, Kafka itself is completely fine if the Zookeeper cluster goes down once it's up and running. It just wouldn't be able to accept new producer/consumer connections or create topics, but the current ones that are active on the cluster continue to work just fine.

0
votes

We have met the same problem in our production environment. It turns out to be a bug (ZOOKEEPER-2184) from zookeeper library which kafka uses talking to zookeeper.

Our kafka version is 1.1.1 which use zookeeper-3.4.10.jar.

After we replaced it with zookeeper-3.4.13.jar, kafka can restart successfully.