0
votes

Is there a way to setup a schedule at which the data in the KTable should be persisted (.to() ) into a topic? Essentially let the KTable accumulate all the data and at a particular time, the data gets written to a topic.

1

1 Answers

1
votes

There is not explicit control, however, KTable internally cache downstream data to oppress consecutive updates to the same key (cf https://kafka.apache.org/11/documentation/streams/developer-guide/memory-mgmt.html and https://www.confluent.io/blog/watermarks-tables-event-time-dataflow-model/). At the same time this cache is flushed each time Kafka Streams commits.

Thus, if the cache is large enough to hold all data you can mimic the desired behavior by configuring the commit.interval.ms accordingly. Note, that this is might be only an approximation of the desired behavior.

As an alternative, you might be able to build a custom solution via Punctuations. The idea would be to not write any data via KTable#to() operator, but use a punctuation schedule to scan the whole store to write the data into a topic. This approach is quite advance and somewhat "hacky" and not a clean solution though.