3
votes

I have some kafka consumer and producers (Spring boot) that when the Kafka node, where they were connected goes down (a failure, for example), they log this:

2019-03-15 11:02:53.278 WARN 1 --- [tainer#1-23-C-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-29, groupId=OperationsConsumer] Error connecting to node kafka-0.kafka-headless.test.svc.cluster.local:9092 (id: 1001 rack: null)

java.io.IOException: Can't resolve address: kafka-0.kafka-headless.test.svc.cluster.local:9092

But then they do not try to reconnect to a valid kafka node, even if I explicitly set the nodes into the bootstrap.servers property.

How can I make my consumer reconnect to a valid kafka node after the kafka node where they have connected fails?

2

2 Answers

1
votes

Check your reconnect properties:

  • reconnect.backoff.ms
  • reconnect.backoff.max.ms

…as mentioned in the Kafka docs

Did you let them elapse before evaluating the reconnection success?

At least one further node except from the down one is necessary in bootstrap.servers to have a chance for new connections. Please do also check, if all of your nodes are communicating via the same Zookeeper(s) and the Kafka setup itself is in good order.

1
votes

Based on host name, looks like you are using Kubernetes

This is very complicated in Kubernetes.

1st try to do telnet hostname 9092 if it works then it is a Kafka config issue otherwise it is a kubernetes setup issue