11
votes

I'm trying to figure out the difference between the settings batch.size and buffer.memory in Kafka Producer.

As I understand batch.size: It's the max size of the batch that can be sent.

The documentation describes buffer.memory as: the bytes of memory the Producer can use to buffer records waiting to be sent.

I don't understand the difference between these two. Can someone explain?

Thanks

3

3 Answers

16
votes

In my opinion,

batch.size: The maximum amount of data that can be sent in a single request. If batch.size is (32*1024) that means 32 KB can be sent out in a single request.

buffer.memory: if Kafka Producer is not able to send messages(batches) to Kafka broker (Say broker is down). It starts accumulating the message batches in the buffer memory (default 32 MB). Once the buffer is full, It will wait for "max.block.ms" (default 60,000ms) so that buffer can be cleared out. Then it's throw exception.

1
votes

Both of these producer configurations are described on the Confluent documentation page as following:

  • batch.size

Kafka producers attempt to collect sent messages into batches to improve throughput. With the Java client, you can use batch.size to control the maximum size in bytes of each message batch.

  • buffer.memory

Use buffer.memory to limit the total memory that is available to the Java client for collecting unsent messages. When this limit is hit, the producer will block on additional sends for as long as max.block.ms before raising an exception.

0
votes

Kafka Producer and Kafka Consumer have many configuration that helps in performance tuning like gaining low latency and High throughput. buffer.memory and batch.size is also one of those and these are specific for Kafka Producer. Let see more details on these configuration.

  1. buffer.memory This sets the amount of memory the producer will use to buffer messages waiting to be sent to broker. If messages are sent by the application faster than they can be delivered to server, producer may run out of space and additional send() call will either be block or throw exception based on the max.block.ms configuration which allow blocking for a certain time and then throw exception. Another case may be if all broker server are down due to any reason and kafka producer will not able to send messages to broker and producer have to keep these messages in the memory allocated based on buffer.memory configuration but this will be filled up soon if broker not back normal state then as mentioned above mx.block.ms time will be considered to free up the space. Default value for max.block.ms is 60,000 ms Default value for buffer.memory is 32 MB (33554432)

  2. batch.size When multiple records are sent to the same partition, the producer will put them in batch. This configuration controls the amount of memory in bytes (not messages) that will be used for each batch. When the batch is full, all the messages in the batch will be sent. However this does not means that the producer will wait for batch to become full. The producer will send half full batches and even the batch with just a single message in them. Therefore setting the batch size too large will not cause delays in sending the messages. it will just use memory for the batches. Setting the batch size too small will add extra overhead because producer will need to send messages more frequently. Default batch size is 16384.

batch.size is also work based on the linger.ms which controls the amount of time to wait for additional messages before sending current batch. As we know that Kafka producer sends a batch of messages either when rge current batch is full or when the linger.ms time is reached. By default prodcuer will send messages as soon as there is a sender thread available to send them even if there is just message in the bacth.