0
votes

I have a five node kafka cluster(confluent 5.5 community edition) with 3 zookeeper nodeseach on different aws instances. While doing failover testing , noticed that the kafka cluster works fine even if all zookeeper nodes are down. I was able to produce , consume and also create new consumers.

  1. why does the kafka cluster not stop if it cannot connect to any zookeeper nodes ?
  2. What would be the possible issues if we are unaware of such a failure scenario in production and kafka cluster continues to run without zookeeper connectivity ?
  3. how do we handle such a scenario ?
1
Maybe this article confluent.io/blog/removing-zookeeper-dependency-in-kafka help you to understanding about zookeeper in kafka.Mihal By
@Mihal That removal process isn't complete yetOneCricketeer

1 Answers

0
votes

Broker leader election, topic creation, simple ACLs (if you use them) still depend on Zookeeper. For other basic functions relying on the Kafka bootstrap protocols, they might still work, sure. There should definitely be broker logs indicating connection was lost

Ideally you'd have basic process healthchecking and incident management software that you shouldn't miss critical services going down in prod

How to handle? Restart Zookeeper...