5
votes

Can a rolling deployment of a Kafka consumer group cause the group to freeze?

So let's consider this scenario,

  1. we start a rolling deployment
  2. one consumer leaves the group
  3. Kafka notices this and triggers a rebalance (hence consumption stops)
  4. rebalance happens but soon a new consumer wants to join
  5. also another consumer leaves
  6. again a new rebalance happens
  7. (loop till deployment is complete)

So if you have a large enough cluster and it takes some time for the deployment to get completed on one machine (which is usually the case), Will this lead to a complete freeze in consumption?

If yes, What are the strategies to do a consumer group update in production

1

1 Answers

5
votes

Yes, that's definitely possible. There have been a number of recent improvements to mitigate the downtime during events like this. I'd recommend enabling one or both or the following features:

Static membership was added in 2.3 and can prevent a rebalance from occurring when a known member of the group is bounced. This requires both the client and the broker to be on version 2.3+

Incremental cooperative rebalancing enables the group to have faster rebalances AND allows individual members to continue consuming throughout the rebalance. You'll still see rebalances during a rolling deployment but they won't result in a complete freeze in consumption for the duration. This is completely client side so it will work with any brokers, but your clients should be on version 2.5.1+