1
votes

if i send just one record at producer side and wait, when will producer sends the record to broker? In kafka docs, i found the config called "linger.ms", and it says:

once we get batch.size worth of records for a partition it will be sent immediately regardless of this setting, however if we have fewer than this many bytes accumulated for this partition we will 'linger' for the specified time waiting for more records to show up.

According above docs, i have two questions.

  1. if producer receives datas which size reaches batch.size, it will immediately trigger to send a request which only contains one batch to broker? But as we know, one request can contain many batches, so how does it happen?

  2. does it mean that even the received datas are not enough of batch.size, it will also trigger to send a request to broker after waiting linger.ms ?

1

1 Answers

0
votes

In Kafka, the lowest unit of sending is a record (a KV pair).

Kafka producer attempts to send records in batches in-order to optimize data transmission. So a single push from producer to the cluster -- to the broker leader to be precise -- could contain multiple records.

Moreover, batching always applies only to a given partition. Records produced to different partitions cannot be batched together, though they could form multiple batches.

There are a few parameters which influence the batching behaviour, as described in the documentation:

buffer.memory -

The total bytes of memory the producer can use to buffer records waiting to be sent to the server. If records are sent faster than they can be delivered to the server the producer will block for max.block.ms after which it will throw an exception.

batch.size -

The producer will attempt to batch records together into fewer requests whenever multiple records are being sent to the same partition. This helps performance on both the client and the server. This configuration controls the default batch size in bytes. No attempt will be made to batch records larger than this size.

Requests sent to brokers will contain multiple batches, one for each partition with data available to be sent.

linger.ms -

The producer groups together any records that arrive in between request transmissions into a single batched request. Normally this occurs only under load when records arrive faster than they can be sent out. However in some circumstances the client may want to reduce the number of requests even under moderate load. This setting accomplishes this by adding a small amount of artificial delay—that is, rather than immediately sending out a record the producer will wait for up to the given delay to allow other records to be sent so that the sends can be batched together. This can be thought of as analogous to Nagle's algorithm in TCP. This setting gives the upper bound on the delay for batching: once we get batch.size worth of records for a partition it will be sent immediately regardless of this setting, however if we have fewer than this many bytes accumulated for this partition we will 'linger' for the specified time waiting for more records to show up. This setting defaults to 0 (i.e. no delay). Setting linger.ms=5, for example, would have the effect of reducing the number of requests sent but would add up to 5ms of latency to records sent in the absence of load.

So from above documentation, you could understand - linger.ms is an artificial delay to wait if there are not enough bytes to transmit, but if producer accumulates enough bytes before linger.ms is elapsed, then the request is sent anyway.

On top of that, batching is also influenced by max.request.size

max.request.size -

The maximum size of a request in bytes. This setting will limit the number of record batches the producer will send in a single request to avoid sending huge requests. This is also effectively a cap on the maximum record batch size. Note that the server has its own cap on record batch size which may be different from this.