I want to use the Kafka Streams Processor API and generate some messages every minute in a scheduled punctuator function. Can Kafka Streams guarantee that these messages get written to the output topic exactly once?
I understand that exactly-once processing is possible in Kafka Streams because it makes a single transaction out of the following operations:
- Commit offset to an input topic
- Write result to an output topic
Does this concept extend to punctuator functions in the processor API, for which there is no associated input message needing a commit?
For example, this punctuator function iterates over items in a key value state store. Each item is deleted from the store and forwarded downstream:
override def punctuate(timestamp: Long) : Unit =
store.all.asScala.foreach { keyValue =>
store.delete(keyValue.key)
context.forward(keyValue.key, keyValue.value)
}
Each message in the store should appear on the output topic exactly once, even in the case of processor failure and restart.
Assume the store is persistent; it is backed by a kafka changelog topic. The punctuator is scheduled every minute wall clock time. I have configured processing.guarantee=exactly_once
in my config.