3
votes

While going through kafka definitve guide, i come across this phrase

When the key is null and the default partitioner is used, the record will be sent to one of the available partitions of the topic at random. A round-robin algorithm will be used to balance the messages among the partitions.

Does it mean this applies only for using default partitioner ?

2

2 Answers

2
votes
  • If a valid partition number is specified, that partition will be used when sending the record.

  • If no partition is specified but a key is present a partition will be chosen using a hash of the key (DefaultPartitioner - see below for more details).

  • If neither key nor partition is present a partition will be assigned in a round-robin fashion


Kafka makes use of the DefaultPartitioner (org.apache.kafka.clients.producer.internals.DefaultPartitioner) in order to distribute messages across topic partitions:

/**
 * Compute the partition for the given record.
 *
 * @param topic The topic name
 * @param key The key to partition on (or null if no key)
 * @param keyBytes serialized key to partition on (or null if no key)
 * @param value The value to partition on or null
 * @param valueBytes serialized value to partition on or null
 * @param cluster The current cluster metadata
 */
public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
    if (keyBytes == null) {
        return stickyPartitionCache.partition(topic, cluster);
    } 
    List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
    int numPartitions = partitions.size();
    // hash the keyBytes to choose a partition
    return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
}

Essentially, the DefaultPartitioner makes use of MurmurHash, a non-cryptographic hash function which is usually used for hash-based lookup. This hash is then used in a modulo operation (% numPartitions) in order to ensure that the returned partition is within the range [0, N] where N is the number of partitions of the topic.

0
votes

Well, no, you can implement a custom partitioner to handle null keys. But without message keys your custom partitioner should behave like the default partitioner (even with a simple random algorithm). Otherwise how can it discriminate about the correct partition to send the message?

As a soft rule, if no key is provided stick to the default partitioner.

A good doc about Kafka custom partitioner.