3
votes

we are using kafka streams' windows join to join 2 streams and we're wondering :

  • Why is KS adding +24hours to the internal topics? For instance, we have a window of 1hour but the internal topic has a retention of 25hours. Can we configure this to not add those 24h ?
  • KS seems to keep the data of both streams in the window - internal topics and state store (rocksdb) - is there a way to keep only the stream on the left side of the join ?

[UPDATE]

For example, we create the JoinWindow like this :

JoinWindows.of(300000).before(600000).until(3600000)

Though I can see that the internal topics (for JOINTHIS and OUTEROTHER) have been created with

Configs:retention.ms=90000000

This was just tested right now on an empty broker (using the confluent cli tool) on my machine

1

1 Answers

9
votes

I will partially answer my own question about the +24h : Indeed the documentation clearly talks about this in here : https://kafka.apache.org/10/documentation/streams/developer-guide/processor-api.html#fault-tolerant-state-stores :

The default retention setting is Windows#maintainMs() + 1 day. You can override this setting by specifying StreamsConfig.WINDOW_STORE_CHANGE_LOG_ADDITIONAL_RETENTION_MS_CONFIG in the StreamsConfig.

And here is the Javadoc about WINDOW_STORE_CHANGE_LOG_ADDITIONAL_RETENTION_MS_CONFIG