3
votes

Source: https://kafka.apache.org/intro

"By having a notion of parallelism—the partition—within the topics, Kafka is able to provide both ordering guarantees and load balancing over a pool of consumer processes. This is achieved by assigning the partitions in the topic to the consumers in the consumer group so that each partition is consumed by exactly one consumer in the group. By doing this we ensure that the consumer is the only reader of that partition and consumes the data in order. "

This only means each consumer will process messages in order, but across consumers in the same consumer group, it may still be out of order. Eg: 3 Partitions. Subscriber via round robin sends M1 to P1, M2 to P2, M3 to P3, then M4 to P1, M5 to P2, and M6 to P3.

Now we have: P1: M1 and M4 P2: M2 and M5 P3: M3 and M6

If each consuemr is tied to a single Partition, then C1 will process M1 and M4 in that order, C2 process M2 and M5, etc. How can we guarantee that M2 is processed (by C2) BEFORE M4 is processed (by C1)?

Or am I misunderstanding something ?

1

1 Answers

3
votes

How can we guarantee that M2 is processed (by C2) BEFORE M4 is processed (by C1)?

Generally you can't.

If each consuemr is tied to a single Partition, then C1 will process M1 and M4 in that order, C2 process M2 and M5, etc.

Even if you had a single consumer that consumed all the partitions for the topic, the partitions would be consumed in a non-deterministic order and your total ordering across all partitions would not be guaranteed.

Or am I misunderstanding something ?

Nope, you are understanding correctly. Ordering is only guaranteed on a single partition.

As Vishal John writes:

For example, assume that your messages are partitioned based on user_id and consider 4 messages having user_ids 1,2,3 and 4. Assume that you have an "users" topic with 4 partitions.

Since partitioning is based on based on user_id, assume that message having user_id 1 will go to partition 1, message having user_id 2 will go to partition 2 and so on..

Also assume that you have 4 consumers for the topic. Since you have 4 consumers, Kafka will assign each consumer to one partition. So in this case as soon as 4 messages are pushed, they are immediately consumed by the consumers.

You can implement consumer logic that buffers and re-orders, but how that logic works depends on your specific use-case.

See also: https://stackoverflow.com/a/39593834/741970.