0
votes

I have inherited some Kafka consumer code that I need to fix. Our consuming process is similar to the example here - https://www.logicbig.com/tutorials/misc/kafka/kafka-manual-commit-async-example.html

Essentially, when consumer.poll is called, it may return multiple records. You process the records and eventually call commitAsync to ensure our app does not ever see the same messages again.

Let's say we received five records from Kafka. While processing, let's say we were able to successfully process two records but had a problem while processing the third record.

Unless the problem is fixed, we cannot continue further processing. However, we cannot call commitAsync as well on Kafka consumer as there are three records we still need to process when the app restarts.

If we don't call commitAsync, we actually end up processing the two records that we had already processed.

One way to avoid the duplication is if we can commit one record at a time. Is there a way to do so?

There is another duplication issue that is relevant.

Let's say we process a record that we received from Kafka. Let's say just before commitAync was called, our app crashed.

In this case as well, we will end up processing the same record once again when the app restarts.

Wondering if there is any support in Kafka to deal with this issue. Thanks.

If duplicate events is an issue for your application, then you may need to look at Exactly Once Semantics. confluent.io/blog/…Mike Gardner
Similar question asked differently - stackoverflow.com/q/72378973/2308683OneCricketeer