0
votes

I'm a newbie on Kafka and trying to figure out how it works.

If I'm right, a Kafka broker will send a bunch of messages in one poll of consumer. In other words, when the consumer invokes the function poll, it will get a bunch of messages and then the consumer will process these messages one by one.

Now, let's assume that there are 100 messages in the broker, from 1 to 100. When the consumer invokes the function poll, 10 messages are sent together: 1 - 10, 11 - 20... At the same time, the consumer will commit automatically the committed offset to the broker every 5 seconds.

Saying that at some moment, the consumer is sending the committed offset while it is processing the 15th meesage.

In this case, I don't know which number is the committed offset, 11 or 14?

If it's 11, it means that if the broker needs to resend for some reason, it will resend the bunch of messages from 11 to 20, but if it's 14, it means that it will resend the bunch of messages from 14 to 23.

1

1 Answers

0
votes

"In this case, I don't know which number is the committed offset, 11 or 14?"

The auto commit will commit always the highest offset that was fetched during a poll. In your case it would commit back 20, independent of which offset is currently being processed by the client.

I guess this example shows you that enabling auto commit comes with some downsides. I recommend to take control of the committed offsets yourself by disabling it and only committing offsets after the processing of all messages was successful. However, there are use cases where you simply can enable auto commit without the need to ever think about it.

"If it's 11, it means that if the broker needs to resend for some reason, it will resend the bunch of messages from 11 to 20, but if it's 14, it means that it will resend the bunch of messages from 14 to 23."

There isa difference between a consumed and a committed offset. Committed offsets only get relevant when you re-start your application or consumers join or leave the consumerGroup of your client. Otherwise, the poll method does not care so much about the committed while the application is running. I have written some more details on the difference between committed and consumed offsets in another answer.