0
votes

I'm trying to understand how Kafka works. I've read that by default Kafka will distribute the messages from a producer in a round-robin fashion among the partitions.

But, why are the messages always put in the same partition if the messages have the same key ? (no partition key strategy configured).

For instance, using the code below the messages are always put in the same partition:

KafkaProducer<String, String> producer = new KafkaProducer<>(properties);
String key = properties.getProperty("dev.id");
producer.send(new ProducerRecord<String, String>(properties.getProperty("kafka.topic"), key, value), new EventGeneratorCallback(key));

With a different key, messages are distributed in a round-robin fashion:

KafkaProducer<String, String> producer = new KafkaProducer<>(properties);
String key = properties.getProperty("dev.id") + UUID.randomUUID().toString();
producer.send(new ProducerRecord<String, String>(properties.getProperty("kafka.topic"), key, value), new EventGeneratorCallback(key));
2

2 Answers

3
votes

This is exactly how a Kafka producer works. This behaviour is defined by the DefaultPartitioner class you can find here at the official repo. If the key isn't specified, the producer uses a round-robin way for sending messages across all the topic related partitions; if the key is specified, the partitioner processes an hash of the key module the number of partition and in this way messages with same key goes to the same partition. Because Kafka guarantees ordering at partition level (not topic level), this is also a way for having all messages with same key in the same partition and so received by consumer in the same order they are sent. Finally, another possible way for sending messages is that the producer specifies the partition destination : in this case no key, no round-robin but the message is sent exactly to the partition specified by the producer itself.

1
votes

This is the expected behavior. All messages with the same key are put in the same partition. If you want round robin assignment for all messages, you should not provide a key. To Kafka, the reason to have a key is to distribute data throughout partitions and putting identical keys in different partitions would break this contract.