I have Kafka and Zookeeper co-located on the same servers, with multiple nodes.
In Kafka's server.properties, I have a line like
zookeeper.connect=server1:2181,server2:2181...
the problem is, Kafka will not start until all of the Zookeeper nodes are available. Otherwise, I will get an error like "fatal error during Kafka startup" and "Timed out waiting for connection while in state: CONNECTING" even though the other Zookeeper nodes are up.
This makes it challenging to script startup of each node independently, since the startup scripts on one node are dependent on the state of other nodes.
First: is this expected behavior or am I doing something wrong? Suppose I have 3 nodes in Zookeeper cluster; all 3 nodes have to be up for Kafka to start? That seems counterintuitive, since a larger cluster would actually increase the chance of failure on startup rather than provide more resiliency.
Second: What's a good solution for this? Is the only approach to make Kafka on each node wait until Zookeeper is fully up on all nodes?