12
votes

I am using kafka version 2.4.1(recently upgraded to 2.4.1 from 2.2.0) and noticed a strange problem.

Even though application(kafka streams) is down (there is no application which is running ) but the consumer group command returns the state as rebalancing. Our application runs as kubernetes pod.

root@bastion-0:# ./kafka-consumer-groups --describe --group groupname --bootstrap-server kafka-0.local:9094 

Warning: Consumer group 'groupname' is rebalancing.

I have waited for some amount of time now(30 mins) and still the command reports 'rebalancing' even though application is down.

Even if i try to delete the group, it gives the following message.

root@bastion-0:/app/kafka_2.12-2.4.1/bin# ./kafka-consumer-groups.sh --delete --group group1  --bootstrap-server kafka.local:9094 

Error: Deletion of some consumer groups failed:
* Group 'group1' could not be deleted due to: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.GroupNotEmptyException: The group is not empty.
root@bastion-0:/app/kafka_2.12-2.4.1/bin# ./kafka-consumer-groups.sh --delete --group group2  --bootstrap-server kafka.local:9094 

Error: Deletion of some consumer groups failed:
* Group 'group2' could not be deleted due to: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.GroupNotEmptyException: The group is not empty.

When i look at the group members, there are members listed even though application is NOT running. Is it because of new rebalance protocol(cooperative rebalance) ?

From where does ./kafka-consumer-groups reads the group membership information. Does it save the member information if the application is down ?

Update:

I brought up the application with a different group name and it came up fine. I can describe the group also. Even then the old group is in 'rebalancing' state.

New Update Also, i found that group coordinator for all the groups was one of the node in kafka cluster and when i rebooted that node, the problem went away.

Question:

Where is group metadata stored ? Can be problem be related to corrupted zookeeper ?

1
Is the pod still active? Clearly kafka thinks some consumer is runningOneCricketeer
Pod is down. that is really strange to me. Other apps(other groups) are fine. I am bringing this up for the first time on this new kafka cluster. Even when i run with --members, it show a list of client-ids but again pod is down.SunilS
I am having the same issue in AWS MSK, where unfortunately I can't restart broker nodes. Perhaps this needs to be raised as a bug to the Kafka team.PMah
Update: it has been raised as a bug! issues.apache.org/jira/browse/KAFKA-9935PMah

1 Answers

2
votes

This was raised as bug here issues.apache.org/jira/browse/KAFKA-9935 with duplicate https://issues.apache.org/jira/browse/KAFKA-9752

This now appears to be fixed since March for versions 2.2.3, 2.3.2, 2.4.2 and 2.5 and above so make sure to use an up to date version.