0
votes

We use kafka topic with 6 partitions and the incoming messages from producers have 4 keys key1,key2,key3,key4 and their corresponding values, I see that the values are distributed only with 3 partitions and the remaining partitions remains empty.

  1. Is the distribution of the messages based n the hash values of the key?
  2. Let is say my hash value of Key1 is XXXX, to which partition does it go among the total of 6 partitions?
  3. I am using kafka connect HDFS connector to write the data to HDFS, and I knew that it uses the hash values of the keys to distribute to the messages to the partitions,is it the same way kafka uses to distribute the messages?
1

1 Answers

0
votes
  • Yes, the distribution of messages against partitions is determined by hash of the message-key modulo total partition count on that topic. E.g. if you're sending a message m with key as k, to a topic mytopic that has p partitions, then m goes to the partition k.hashCode() % p in mytopic. I think that answers your second question too. In your case two of the resulting values are getting mapped to same partition.

  • If my memory serves me correctly Kafka-hdfs connector should take care of consuming from a Kafka topic and putting it into Hadoop HDFS. You don't need to worry about the partitions there, it is abstracted out.