I'm working on a Flink streaming processor that reads events from Kafka. Those events are keyed by one of there fields and should be windowed over a period of time before being reduced and outputted. My processor uses event time as time characteristic and therefore reads the timestamp from the events it consumes. Here's what it currently looks like:
source
.map(new MapEvent())
.assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor<Event>(Time.seconds(10)) {
@Override public long extractTimestamp(Event event) {
return event.getTimestamp();
}
})
.keyBy(new KeySelector())
.timeWindow(Time.minutes(1))
.reduce(new EventReducer())
.map(new MapToResult());
What I know about the events is the following:
- They are unordered in regards to their event time.
- Late arrivals are possible, thus events might arrive significantly later than there timestamp says. For ease of use, let's say I know, that the latest possible arrival would be 20 seconds.
- I want my activities to be windowed for exactly one minute before Flink forwards them into the following reduce operator.
And finally, here are my questions:
- Given my previously described use-case, is the
BoundedOutOfOrdernessTimestampExtractora good choice? I've read my way through the docs and saw theAssignerWithPunctuatedWatermarksand other predefined assigners available for the watermark creation, but didn't understand completely, if those would be better for me. - How does the
assignTimestampsAndWatermarks()play together with thetimeWindow()method? Can they interfere when it comes to late arrivals? Is there anything I have to keep in that area that I have to keep in mind?