Kafka Consumer being Starved because of unbalance

Question

I am new to Kafka and think I am missing something on how partition queues get balanced on a topic

We have 5 partitions and 2 consumers on a topic. The topic has a null key so I assume Kafka randomly picks a new partition to add the new record to in a round robin fashion.

This would mean one consumer would be reading from 3 partitions and the other 2. If my assumption is right (that the records get evenly distrusted across partitions) the consumer with 3 partitions would be doing more work (1.5x more). This could lead to one consumer doing nothing while the other keeps working hard.

I think you should have an even divisible number of partitions to consumers.

Am I missing something?

Soheil Pourbafrani Soheil Pourbafrani · Accepted Answer · 2018-08-01T05:12:00

The unit of parallelism in consuming Kafka messages is the partition. The routine scenario for consuming Kafka messages is getting messages using a data stream processing engine like Apache Flink, Spark, and Storm that all of them distributed processing on CPU cores. The rule is the maximum level of parallelism for each consumer group can be the number of partitions. Each consumer instance of a consumer group (say CPU cores) can consume one or more partitions and on the other hand, each partition can be consumed by just one consumer instance of each consumer group.

If you have more CPU core than the number of partitions, some of them will be idle.
If you have less CPU core than the number of partitions, some of them will consume more than one partitions.
And the optimized case is when the number of CPU cores and Kafka partitions are equal.

The image can describe all well:

Kafka Consumer being Starved because of unbalance

4 Answers