0
votes

I'm using kafka consumer group management for processing my messages.

The processing time for my messages vary from one another. So I have set the max poll interval to 20 min for max records of 20. And I'm using 5 partition and 5 consumer instances with default config values apart from the above two.

But still I'm getting the following error intermittently:

[Consumer clientId=consumer-3, groupId=amc_dashboard_analytics] Attempt to heartbeat failed since group is rebalancing

The understanding is that rebalancing won't occur unless poll is not called before max poll interval is reached as written in consumer config Docs. But for me rebalancing occurs before 20 minutes only.

Also after few hours of running, all the assigned consumers just leave saying "Attempt to heartbeat failed since group is rebalancing" and do not join back again(Ideally should join back again).

Am I missing something here? Any leads would be helpful.

1
Heartbeats are the basic mechanism to check if all consumers are still up and running. If you get a heartbeat failure because the group is rebalancing,DeV
it indicates that your consumer instance took too long to send the next heartbeat and was considered dead and thus a rebalance got triggeredDeV
@DeV but why is it taking time to send heartbeat. What I read was consumer keep on sending heartbeats in a parallel thread separate from max poll interval processing thread.Nobita
A rebalance will also occur when consumers join or leave the group.Gary Russell
@Nobita I added some possible reasons to cannot send heartbeat to broker. You can check my answer.H.Ç.T

1 Answers

0
votes

Another reason of rebalance is expiring session.timeout.ms without sending heartbeat. You can consider to increase this consumer config.

From Kafka docs:

heartbeat.interval.ms: The expected time between heartbeats to the consumer coordinator when using Kafka's group management facilities. Heartbeats are used to ensure that the consumer's session stays active and to facilitate rebalancing when new consumers join or leave the group. The value must be set lower than session.timeout.ms, but typically should be set no higher than 1/3 of that value. It can be adjusted even lower to control the expected time for normal rebalances. (default: 3000)


session.timeout.ms: The timeout used to detect client failures when using Kafka's group management facility. The client sends periodic heartbeats to indicate its liveness to the broker. If no heartbeats are received by the broker before the expiration of this session timeout, then the broker will remove this client from the group and initiate a rebalance. Note that the value must be in the allowable range as configured in the broker configuration by group.min.session.timeout.ms and group.max.session.timeout.ms. (default: 10000)

You can check this link for more information.

Even if heartbeat is sent in fixed time intervals via separate thread, in some cases heartbeat cannot be sent to broker in session.timeout.ms. Some of the possible reasons of this situation is:

  • Network problem
  • stop-the-world garbage collection in consumer or broker sides