I am building a Flink streaming application and would prefer to use event-time because it will ensure that all of the timers that are set will trigger deterministically given a failure or replay of historical data. The problem with event-time is that time only moves forward if events come in. Our data sources (physical sensors) sometimes produce very little data, so there are times where a single data point might open a five minute aggregation window, but the next data point is 20 minutes later, so the window closes and emits an output record very late.
Our proposed solution to this was to use an AWS lambda function that is scheduled to run every X minutes that outputs a dummy event into our Kinesis stream that Flink reads from, thus forcing a watermark to be generated that advances time forward.
My concern is that this only works if watermarks are truly global, meaning that a SINGLE heartbeat message can lead to the creation of a watermark that advances the event time of every single operator/task in a Flink application that uses data that originated from this stream. The documentation has led me to believe that Flink parellizes reading from a source, where each parallel read operator generates its own watermarks and then a downstream operator, say a window, takes the minimum of the various watermarks it has seen. If this is the case, this seems problematic to me because a dummy heartbeat event would be needed per parallel watermark generator, but I do not get control over which nodes read my heartbeat messages out of the stream.
So, my question is, how exactly do downstream operators use watermarks to advance event-time and could a single dummy message be added to the kinesis stream to advance event-time across the ENTIRE Flink application?
If not, how can I force event time to move forward?