2
votes

I'm using a Kafka Producer and my application sends individual ProducerRecords all with the same key into a single partition, and these ProducerRecords are then batched (using batch.size and linger.ms parameters) before being sent to the brokers. I have enable.idempotence=true and acks=all.

If one record in the middle of a batch fails to be written, for example if a host crashes or a network failure or disk failure occurs or the record failed to be acked by the minimum replicas, does Kafka guarantee that all subsequent records will also not be written? Or is there a possibility that a record in the middle of a batch could be missing?

2
Do you want to guarantee not missing any records or ordering of the records?ndogac
Are you calling beginTransaction() in the code?OneCricketeer
@ndogac I want to guarantee that if I read the last record, then I can safely assume that all previous records will also be there.k314159
@OneCricketeer no, I'm not. I was using transactions previously but the throughput was unacceptable and I found that using batches made the throughput much higher.k314159

2 Answers

1
votes

If one record in the middle of a batch fails to be written, for example if a host crashes or a network failure or disk failure occurs or the record failed to be acked by the minimum replicas, does Kafka guarantee that all subsequent records will also not be written?

Yes, if any message within a batch fails, then all messages in the same batch fail. So none of the messages within the batch will be written to the broker's disk.

Or is there a possibility that a record in the middle of a batch could be missing?

No, either all or none messages of the batch are written to the broker.

This is achieved by the separation between the Producer client thread and a local buffer that queues and batches the data before sending it physically to the broker.

1
votes

Since your records are all going to the same partition, you can safely assume all previous records will also be there.

Kafka guarantees ordering in a given partition, so if you are sending messages m1 and m2 (in order) to the partition, the batch and linger logic will not override the ordering. In other words, if you see the message m2 at your consumer, you can safely assume that m1 was delivered safely as well.