0
votes

To maintain the server, one of the 20 brokers was shutdown gracefully, but all kafka-connect cluster (sink) died with the following NPE error. Replication-factor of all topics was more than 2, there were 50 topics and 200 partitions. Checking up the error and the Kafka library source code, it seems that the error occurred when the Connect client cached the metadata including the broker node id set and partition info set information received from the broker.

How can this happen, and how to deal with it in the future? (the Version of Broker and Client is v2.3.1)

enter image description here

enter image description here

1

1 Answers

0
votes

This is a bug. The Connect cluster should not be negatively impacted by a broker shutting down and it should not throw an NPE.

Please open a ticket in https://issues.apache.org/jira/projects/KAFKA/issues/. It's also best it you paste the stack trace as text instead of an image.