Uneven Distribution of messages in Kafka Partitions

Question

I have a topic with 10 partitions, 1 consumer group with 4 consumers and worker size is 3.

I could see there is an uneven distribution of messages in the partitions, One partition is having so much data and another one is free.

How can I make my producer to evenly distribute the load into all the partitions, so that all partitions are being utilized properly?

I need to clarify some things. Are you using a custom partition strategy or the default one? How , do you know there is a uneven distribution of messages. — Indraneel Bende
@IndraneelBende When I describe my topic, it shows the lag through which I can confirm that some partitions are having a lag of more than 1lac and some are having 0 lag that means there is no data in the partition. Not sure about the strategy but this is something I can see in the code : this.partitionerClass = props.getString("partitioner.class", "kafka.producer.DefaultPartitioner"); — Pacifist
If you are using default partitioner , then messages are produced in a round-robin fashion across the different partitions . How are you calculating this lag? — Indraneel Bende
Lag=LOG END OFFSET - CURRENT OFFSET Yes, that's what kakfa documentation says but not getting why one partition is overloaded and another one is free. — Pacifist

Thomas Kabassis Thomas Kabassis · Accepted Answer · 2018-06-17T10:28:27

According to the JavaDoc comment in the DefaultPartitioner class itself, the default partitioning strategy is:

If a partition is specified in the record, use it.
If no partition is specified but a key is present choose a partition based on a hash of the key.
If no partition or key is present choose a partition in a round-robin fashion.

https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/producer/internals/DefaultPartitioner.java

So here are two possible reasons that may be causing the uneven distribution, depending on whether you are specifying a key while producing the message or not:

If you are specifying a key and you are getting an uneven distribution using the DefaultPartitioner, the most apparent explanation would be that you are specifying the same key multiple times.
If you are not specifying a key and using the DefaultPartitioner, a non-obvious behavior could be happening. According to the above you would expect round-robin distribution of messages, but this is not necessarily the case. An optimization introduced in 0.8.0 could be causing the same partition to be used. Check this link for a more detailed explanation: https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified? .

Uneven Distribution of messages in Kafka Partitions

7 Answers