0
votes

Could anyone please explain and direct me link or resource to read about how the kafka consumers works in below scenarios.

  1. One consumer group with 5 consumers and topic with 3 partitions (how kafka decides )

  2. One Consumer group with 5 consumers and topic with 10 partitions ( how kafka share load)

  3. Two consumer group with 1 consumer each and kafka cluster of 2 servers where one topic is partitioned between node 1 and node 2 , how duplications can be avoided when consumers from different groups subscribed to one partition.

The above may not a best practice when configuring kafka , but i need to know how it handled.

Thanks in Advance.

2

2 Answers

4
votes

It's not Kafka itself to assign partitions, but one of the consumers. The first one joining a consumer group will be elected as sort of "leader" and we'll start assigning partitions to the other consumers. Of course, every time a new consumer joins the group, the Kafka "controller" let the leader consumer to know about that and it starts the rebalancing re-assigning partitions. It's the same when a consumer leaves a consumer group.

To confirm that the consumer is involved on that, the strategy for partition assignment is specified by the partition.assignment.strategy property in a consumer configuration. The default value is RangeAssignor while the other ones are RoundRobinAssignor and StickyAssignor. You can find more about how they work here:

https://kafka.apache.org/21/javadoc/org/apache/kafka/clients/consumer/RangeAssignor.html https://kafka.apache.org/21/javadoc/org/apache/kafka/clients/consumer/RoundRobinAssignor.html https://kafka.apache.org/21/javadoc/org/apache/kafka/clients/consumer/StickyAssignor.html

Said that, what happens specifically in your scenarios?

  1. 3 consumers will get one partition each. The other 2 will be idle.
  2. each consumer will get 2 partitions
  3. Using different consumer groups mean pure pub/sub where the consumer groups get same messages. In your case with 1 topic and 2 partitions (on 2 brokers), the two consumers each in one different consumer group, will get the same messages from all 2 partitions. If consumers are part of different consumer groups you cannot avoid duplication, it's how Kafka works.
1
votes

It depends on partition.assignment.strategy property, which is set to the class org.apache.kafka.clients.consumer.RangeAssignor bu default. From the java doc:

The range assignor works on a per-topic basis. For each topic, we lay out the available partitions in numeric order and the consumers in lexicographic order. We then divide the number of partitions by the total number of consumers to determine the number of partitions to assign to each consumer. If it does not evenly divide, then the first few consumers will have one extra partition. For example, suppose there are two consumers C0 and C1, two topics t0 and t1, and each topic has 3 partitions, resulting in partitions t0p0, t0p1, t0p2, t1p0, t1p1, and t1p2. The assignment will be: C0: [t0p0, t0p1, t1p0, t1p1] C1: [t0p2, t1p2]

You can provide your own strategy by implementing org.apache.kafka.clients.consumer.internals.PartitionAssignor. There is a good article on Medium about it.