2
votes

From the API document of Kafka I found a description of the send() method in Apache Kafka(api document of producer):

“The send is asynchronous and this method will return immediately once the record has been stored in the buffer of records waiting to be sent. This allows sending many records in parallel without blocking to wait for the response after each one.”

I’m just wondering how are the records send in parallel? If I have 3 brokers, and on each broker with 3 partitions under the same topic, will Kafka producer send records to the 9 partitions in parallel? Or producer just send records to 3 brokers in parallel? How does producer work in a parallel way?

2

2 Answers

1
votes

Kafka client uses a org.apache.kafka.common.requests.ProduceRequest that can carry payloads for multiple partitions at once (see http://kafka.apache.org/protocol.html#The_Messages_Produce).

So it sends (using org.apache.kafka.clients.NetworkClient) in three requests in parallel, to each of (three) brokers, i.e.:

- sends records for topic-partition0, topic-partition1, topic-partition2 to broker 1
- sends records for topic-partition3, topic-partition4, topic-partition5 to broker 2
- sends records for topic-partition6, topic-partition7, topic-partition8 to broker 3

You can control how much batching is done with producer configuration.

(notice I answered with 9 unique partitions, if you meant replicated partitions, you send only to leader - then the replication will handle the propagation).

1
votes

Yes, the Producer will batch up the messages destined for each partition leader and will be sent in parallel. From the API Docs:

The send() method is asynchronous. When called it adds the record to a buffer of pending record sends and immediately returns. This allows the producer to batch together individual records for efficiency.

and

The producer maintains buffers of unsent records for each partition. These buffers are of a size specified by the batch.size config. Making this larger can result in more batching, but requires more memory (since we will generally have one of these buffers for each active partition).

Here's a diagram to help: enter image description here