What's the best way to design message key in Kafka?

Question

I have a partitioned topic, which has X partitions.

As of now, when producing messages, I create Kafka's ProducerRecord specifying only topic and value. I do not define a key. As far as I understand, my messages gonna be distributed evenly amongst partitions using default built-in partitioner. On the other hand, I have a thread pool of Kafka consumers. Each Kafka consumer will be running in its own dedicated thread consuming messages from the topic. Each of those consumers is given the same group.id. This will allow consuming messages in parallel. Every consumer will be assigned its fair share of partitions to read from.

I want my messages to be consumed in an orderly fashion. I know that Kafka guarantees the order of messages within a partition. So, as long as I come up with a proper key structure, I will have my messages partitioned in a way that they will end up in the same partition. In a way, message key groups messages and stores them in the partition.

Does it make sense?

Q: Is there a chance that due to a badly designed key I will get uneven partitions? One may receive way more records than the others. Can it impact in a bad way performance of my Kafka cluster? What are the best practices for message key design?

Daniel Daniel · Accepted Answer · 2017-08-25T22:15:42

Your understanding of default partitioner is correct.

When you don't have a requirement to consume some messages in the same order as they were produced then not specifying a key is the best option. If that is not your case, then your requirement tells you what must be your key. For instance if you want to preserve the order of produced messages for a given user, a user_id is potentially your message key.

To achieve a particular messages order you need to think how producers are configured. If your producers can retry sending a message in case of failure and in flight messages are higher than 1 then messages can be produced out of order.

You can get uneven partition by specifying bad key. For example, if 90% of your users are from New York and 10% from other cities and you choose a city as a key, then one of yours partition will be huge and one of the consumers overloaded (I assume that the number of messages per user is the same).

What's the best way to design message key in Kafka?

2 Answers