3
votes

Given the following setup:

  • Kafka v0.11.0.0
  • 3 brokers
  • 2 topics, each with 2 partitions, replication factor of 3
  • 2 consumer groups, one for each topic
  • 3 servers that contain consumers

The servers contain two consumers, one for each topic such that:

  • Server A
    • consumer-A1 in group topic-1-group consuming topic-1
    • consumer-A2 in group topic-2-group consuming topic-2
  • Server B
    • consumer-B1 in group topic-1-group consuming topic-1
    • consumer-B2 in group topic-2-group consuming topic-2
  • Server C
    • consumer-C1 in group topic-1-group consuming topic-1
    • consumer-C2 in group topic-2-group consuming topic-2

In this scenario, when we examine the output of kafka-consumer-groups.bat for group topic-1-group, we see the following:

  • consumer-B1 is assigned to topic-1 partition-1
  • consumer-C1 is assigned to topic-1 partition-0
  • consumer-A1 is assigned to no partition

This appears to be as we would expect. Since the partition count is 2, we only have two active consumers. The third consumer is just idle. We are able to consume messages from the topic just fine.

Next, we shutdown Server B (who is actively assigned to a partition). Doing so, we would expect topic-1-group to enter rebalancing and expect that consumer-A1 would take the place of consumer-B1 and be assigned to a partition such that the following is true:

  • consumer-A1 is assigned to topic-1 partition-1
  • consumer-C1 is assigned to topic-1 partition-0
  • consumer-B1 is assigned to nothing since it is no longer active

What we are seeing happen, though, is the consumer group topic-1-group enters a state of rebalancing that doesn't seem to stop. Heartbeats also seem to fail since the group is in rebalancing.

The only way to recover from this is to shutdown another server so that there is only one consumer for topic-1-group. When there is only one consumer, we are able to successfully receive messages for the topic. Next, if we start up the other two servers, we continue to receive messages successfully for the topic.

Questions

  • Is this a valid usage scenario?
  • What is expected in this sort of scenario?
  • Could there be an issue with the consumers? (In terms of configuration, we are using the defaults for everything with the exception of setting the basics like topic, consumer group, etc... We are using KafkaConsumer.subscribe(Collection) and not manually assigning partitions)
  • Could there be an issue with the brokers/Zookeeper?
2
Hi, interesting. I would initially suspect the 2 remaining nodes can't get into a "consensus" because there's only 2 nodes, this could be caused by the rebalancing algorithm of kafka clients. I think since v0.9.x they started using their own implementation and are not relying in zookeeper zab anymore.groo

2 Answers

0
votes

(I'll post as an answer since I'm not cool enough to comment. And this may be 'the answer', albeit an unsatisfying one: more consumers than partitions is not a supported configuration).

According to the kafka documentation: https://kafka.apache.org/documentation.html#introduction By having a notion of parallelism—the partition—within the topics, Kafka is able to provide both ordering guarantees and load balancing over a pool of consumer processes. This is achieved by assigning the partitions in the topic to the consumers in the consumer group so that each partition is consumed by exactly one consumer in the group. By doing this we ensure that the consumer is the only reader of that partition and consumes the data in order. Since there are many partitions this still balances the load over many consumer instances. Note however that there cannot be more consumer instances in a consumer group than partitions.

While in practice, the extra consumer stays idle until an active consumer goes away, it seems to sometimes get in a state where it is perpetually rebalancing.

This stackoverflow thread (In Apache Kafka why can't there be more consumer instances than partitions?) discusses the issue and talks about why you'd want fewer consumers than partitions but doesn't say what happens when you have more. One of the interesting comments gives a reason why you may want to configure more (for failover) but there were no replies: now we additionaly want to make sure that even if some of consumer instances fails we still have one partition per consumer instance. Logical way of doing this would be to add more consumers to the group; while everything is OK they wouldn't do anything, but when some consumer fails one of them would receive that partition.Why is this not allowed?

0
votes

As per Apache kafka confluent- if you add more consumer to a group than partition then some of the consumer remain idle so ideally you should not do that