1
votes

I'm currently learning more about the Kafka Producer. I am a bit puzzled by the following paragraph from the docs:

Messages written to the partition leader are not immediately readable by consumers regardless of the producer’s acknowledgement settings. When all in-sync replicas have acknowledged the write, then the message is considered committed, which makes it available for reading. This ensures that messages cannot be lost by a broker failure after they have already been read. Note that this implies that messages which were acknowledged by the leader only (that is, acks=1) can be lost if the partition leader fails before the replicas have copied the message. Nevertheless, this is often a reasonable compromise in practice to ensure durability in most cases while not impacting throughput too significantly.

The way I interpret this is that messages can get lost during the sync between leader and replicated brokers, i.e. messages won't be committed unless they have been successfully replicated.

I don't understand how (for example) the Java application can shield against this message loss. Does it receive different acknowledgements between 'only-leader' and the full replication?

this is often a reasonable compromise in practice

How is that? Do they assume that you should log failed messages and re-queue them manually? Or how does that work?

1

1 Answers

1
votes

"Does it receive different acknowledgements between 'only-leader' and the full replication?"

There is no difference between a leader and replica acknowledgment. You only steer the behavior of the producer through its configuration acks. If it is set to 1 it will wait only for the leader acknowledgment, if you set it to all it will wait for all replicas (based on the replication factor of the topic) before the producer considers writing the message as successful.

If you set acks=all and the synchronisation between leader and replicas fail, your producer will receive a retriable Exception (either "NotEnoughReplicasException" or "NotEnoughReplicasAfterAppendException", see more details here). Based on the producer configuration retries it will try to re-send the message. Kafka is built in a way that it expects crashed brokers to be available again (in a "short" amount of time).

In case you have set acks=1 and the synchronisation between leader and replicas fail, your producer considers the message was successfully written to the cluster and it will not try to reproduce the message. Of course the leader will continue to replicate the message to its replicas. But it is not really guaranteed that this will happen. And before the message got replicated the leader broker itself could have issues which will cause the message to be lost forever.