0
votes

I am using a tumbling window(for 5min), and AscendingTimestampExtractor since my source is from Kafka. But the window always runs longer than 5mins. Could anyone suggest?

1

1 Answers

1
votes

Let's imagine we're talking about the window for the interval from 9:00 to 9:04.999. With processing time windowing, as soon as the time of day reaches 9:05, the window will be triggered.

Since you have mentioned the AscendingTimestampExtractor watermark assigner, I assume you are using event time windowing (and not processing time windowing). And when you say that the window runs for longer than 5 mins, I assume you mean that results are not being produced immediately, at 9:05.

In the case of event time windowing, the window closing at 9:05 will wait for a watermark of at least 9:05. Such a watermark will have to wait for an event with a timestamp of at least 9:05 -- which means that the window triggering is delayed by whatever latency your events are experiencing.

Some of that latency is due to the parts of your pipeline before the events are ingested by the Flink Kafka consumer. Flink then causes some additional latency: in particular, the auto watermarking interval (200 msec, by default) and networking buffering (100 msec, by default) can have noticeable impacts.

Note that if you are operating a parallel pipeline with a keyBy, then the slowest of the Kafka consumers will hold everyone back to their watermark. And if you are using per-partition watermarking, then the slowest partition will determine the overall watermark.