2
votes

I am using kafka_image=wurstmeister/kafka

zookeeper_version=3.4.14 kafka_version=2.12-2.4.0

C# client: Confluent kafka v1.2.0

We are using 3 brokers , 1 zookeeper cluster As a pat deployment we use to stop all the brokers ,zookeeper,producer,consumers and delete the kafka-log files, and starts consumers first later will starts the brokers and zookeeper. In this process some time consumer getting stuck, its not picking any messages even it alive. If i restarts the consumer, it started picking

1
Thanks for answer, i didn't found any ERROR in logs,Rafi
Not clear why you would start consumers before any server is available to process the requestOneCricketeer

1 Answers

2
votes

Rebalance can be the reason for such a behaviour. When rebalance starts in a consumer group, all the consumers in this group are revoked and during rebalance consumers cannot commit offset or poll data until rebalance finishes and partitions are assigned to new consumers.


Some important notes to consider:

  • rebalance timeout is equal to max.poll.interval.ms. So if your max.poll.inteval.ms is so high because of long running processes then rebalance can take so much time.

    Reasons to rebalance:

  • Joining new consumer to consumer group
  • Clean shutdown of a consumer
  • Adding new partition(s) to a topic which is subscribed by the consumer group
  • When a consumer is considered dead by the group coordinator
    • Expiring session.timeout.ms without sending heartbeat
    • Not calling poll during max.poll.interval.ms

Reason to face with rebalance after restart can be the joinGroupRequests that consumers send to group coordinator by calling poll. Each requests trigger to rebalance. (in potentially) Then you are getting lots of rebalances. To overcome this problem, you can increase group.initial.rebalance.delay.ms. It is 3 sec in default.

group.initial.rebalance.delay.ms: The amount of time the group coordinator will wait for more consumers to join a new group before performing the first rebalance. A longer delay means potentially fewer rebalances, but increases the time until processing begins.