1
votes

I do realize that ordering per partition is assured in Kafka. But how would the partition be affected when there are multiple partitions and no key is specified by the producer, but just 1 consumer(Why have 1 consumer? For current data load 1 is fine, having multiple partitions for future use)

20 partitions
1 consumer
No key specified when producing.

1)Would the ordering be affected?

2)Would the consumer read data from partition 0,1..20 one after the other in order?

3)Even if we specify the partition key are we assured we would have ordering in place? (Except for the case of hash collision)

1

1 Answers

2
votes

If you not define a key in the producer side, kafka will generate a message in each partition per time. Doing the production of the data in a circular way [code here].

Example if you have 2 partitions:

msg_1 -> partition: 0
msg_2 -> partition: 1
msg_3 -> partition: 0
msg_4 -> partition: 1

The problem of doing this you cannot ensure the ordering in the other side, due to the consumed messages can be consumed in different time per partition. Imagine that you have a message in Partition 0 offset 1 and the second message is in Partition 1 offset 1. Kafka consumer can start consume messages from partition 1 before goes to partition 0.

To avoid this problem you should always use the same key for the messages that you need the order. The only way to solve this problem now is to create a state store and check the state of your document every time you need to read it.

If you setup the key, you will always send the key to the same partition, the only way to have different order is in the producer side, but this is going to be a race condition. The case of having broken order is only if you have two producers producing the same key at the same time. You can check the logic here.