2
votes

We are using Kafka 0.10... I'm seeing some conflicting information online (and in documentation) regarding how offsets are managed in kafka when enable.auto.commit is TRUE. Does the same poll() method that retrieves messages also handle the commits at the configured intervals?

If i retrieve messages from poll in a single threaded application, process the messages to completion (including handling errors) in the SAME thread, meaning poll() will not be invoked again until after my processing is complete, then I presume there is no fear in losing messages, correct? This only works if poll() attempts the commit at the subsequent invocation (if the auto.commit.interval.ms has passed, of course). If the commits are done immediately upon receiving the messages (prior to my app processing the messages), this will not work for us....

This is important, as I want to be certain we won't lose messages if we use the automatic commit policy. Duplicate messages are tolerable for us, we just have no tolerance for lost data.

Thanks for the clarification!

1

1 Answers

4
votes

Does the same poll() method that retrieves messages also handle the commits at the configured intervals?

Yes. (If enable.auto.commit=true.)

If i retrieve messages from poll in a single threaded application, process the messages to completion (including handling errors) in the SAME thread, meaning poll() will not be invoked again until after my processing is complete, then I presume there is no fear in losing messages, correct?

Yes.

This only works if poll() attempts the commit at the subsequent invocation (if the auto.commit.interval.ms has passed, of course)

This is exactly how it is done.

See here for further details: http://docs.confluent.io/current/clients/consumer.html