We are using Kafka stream's SessionWindows to aggregate arrival of related events. Also along with the aggregation we are specifying the retention time for the window using until()
API.
Stream info:
The session window (inactivity time) is 1 minute and the retention time passed to until()
is 2 minutes.
We are using customized TimestampExtractor
to map event's time.
Example:
Event: e1; eventTime: 10:00:00 am; arrivalTime:2pm(same day)
Event: e2; eventTime: 10:00:30 am; arrivalTime 2:10 pm (same day)
The arrival time for the second event is 10 minutes after the arrival of e1 which exceeds retention time + inactivity time. But older event e1 is still part of the aggregation despite the retention time being 2 mins.
Questions:
1) How does kafka streams clean up state store using until()
API? Since the retention value specified as an argument is "lower bound for how long a window will be maintained." When exactly the window is purged?
2) Is there a background thread that cleans up the state store periodically? If yes, then is there a way to identify the actual time when the window is purged.
3) Any stream configuration that would purge the data for a window after retention time.