1
votes

Kafka generates offset for each message. Say, I am producing messages 5 and the offsets will be from 1 to 5.

But, In a transactional producer, Say, I produced 5 messages and committed, and then 5 messages but aborted and then 5 messages committed.

  1. So, the last committed 5 messages will have offset from 6 to 10 or 11 to 15?

  2. What if i dont abort or dont commit. Will the messages still be posted?

  3. How Kafka ignores offsets which are not committed? As, kafka commit logs are offset based. Does it use transaction commit log for transactional consumer to commit offsets and return Last stable offset? Or, is it from __transaction_state topic which maintains the offsets?

1

1 Answers

4
votes
  1. The last 5 messages have offsets 11 to 15. When consuming with isolation.level=read_committed, the consumer will "jump" from offset 6 to 11.

  2. If you don't commit or abort the transaction, it will automatically be timed out (aborted) after transaction.max.timeout.ms has elapsed.

  3. Along with the message data, Kafka stores a bunch of metadata and is able to identify for each message if it has been committed or not. As committing offsets is the same as writing to a partition (the only difference is that it's done automatically by Kafka in an internal topic __consumer_offsets) it works the same way for offsets. Offsets added via sendOffsetsToTransaction() that were aborted or not committed will automatically be skipped.

As mentioned in another of your questions, I recommend having a look a tthe KIP that added exactly-once semantics to Kafka. It details all these mechanics and will help you gettting a better understanding: https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging