0
votes

Imagine a scenario where we have 3 partitions belonging to 3 different topics on a machine which runs a kafka process/broker. This broker will receive messages for all three partitions. It will store them on different log subdirectories. My question is how does the kafka broker schedule these writes? How does it decide which partition/topic will be written next?

1

1 Answers

0
votes

For ordering over requests, the image below shows roughly, how the broker internally handles produce requests: enter image description here

There is a number of network threads that pull bytes of the network layer and convert these to internal requests. These requests are then stuck in a fifo request queue, from where the io threads pull them and append the contained messages to the relevant partitions. So in short messages are processed in the order they are received in.

Looking through the code I am unsure, whether there may be potential for a race condition here, where a smaller request could "overtake" a large request that was sent immediately before it. However even if this were possible it is an extremely unlikely fringe case that I can't see ever occurring for a single producer. Maybe someone with a better understanding of the code can weigh in here?

As for ordering of batched messages in one request, the request stores messages internally in a HashMap, which uses TopicPartition as a key, since as far as I am aware a Scala HashMap does not preserve ordering of the inserted elements, I don't think that there are any guarantees around the order in which multiple partitions in one request get processed - which is fine, as ordering is only guaranteed to be preserved within the partition.

Within each partition, messages are processed in the order they were given to the producer before sending.