I was going through this article which explains how to ensure message is processed exactly once by doing following:
- Read (topic, partition, offset) from database on start/restart
- Read message from specific (topic, partition, offset)
- Atomically do following things (say for example in same database transaction):
- Processing message
- Commit offset to database as (topic, partition, offset)
- Manually commit offset to Kafka by calling
consumer.commitAsync()
orconsumer.commitSync()
My doubt is what is effect of setting different values to different consumer properties:
enable.auto.commit
How should I set this property?true
orfalse
? Article says we should set it tofalse
. But what wrong can go if I set it totrue
? In this, I am saving offset to external database. So after crash, when consumer comes online, it will start consuming from offset saved in database. So, I feel, value of this property has no effect on start/restart.
Also I dont feel there will be any effect of different values of this property within single consumer run, as offset is used to read next message and whether we commit it manually or automatically has no effect (it will still be same offset).auto.offset.reset
There are two main values of this propertylatest
andearliest
. If set tolatest
, it will make consumer read messages put afterwards, that is after starting consumer. If set toearliest
, it will make consumer read from first unread message. Since this both affect from where consumer should start reading message when started, I feel this property will also not have any effect on atomic consumer specified in the article. This is because, in this implementation, newly started consumer starts reading messages from the offset specified in the database.
Am I correct with both above thoughts?