21
votes

The document https://www.safaribooksonline.com/library/view/kafka-the-definitive/9781491936153/ch04.html says that "Note that with auto-commit enabled, a call to poll will always commit the last offset returned by the previous poll. It doesn’t know which events were actually processed, so it is critical to always process all the events returned by poll before calling poll again (or before calling close(), it will also automatically commit offsets)". If that's the case how does it work if auto.commit.interval.ms is larger than the time if takes to process the messages received from previous poll().

To make it more concrete, consider the scenario where I have following:

enable.auto.commit=true

auto.commit.interval.ms=10

And I call poll() in a loop.

1) On first call to poll(), I get 1000 messages (offset 2000-3000) and it takes 1 ms to process all 1000 messages

2) I call poll() again. In this 2nd poll() call, it should commit the latest offset 3000 returned from previous poll() but since auto.commit.interval.ms is set to 10 ms, it won't commit the offset yet, right?

In this scenario, the committed offset will lag further and further behind the latest offset that was actually processed?

Could someone clarify/confirm?

1

1 Answers

23
votes

You describe the behavior correctly. However, you conclusion is not correct. The committed offset will not lag further and further. After auto-commit interval passed, the next call to poll will commit all processed messages.

Let's say, you call poll each 10 ms, and set commit-interval to 100ms. Thus, in every 10th call to poll will commit (and this commit covers all messages from the last 10 poll calls).