0
votes

We are building a stream processing pipeline to process Kinesis messages using Flink v1.11 and Event time characteristics. While defining a source watermark strategy, in the official documentation, I came across two out-of-the-box watermark strategies; forBoundedOutOfOrderness and forMonotonousTimestamps. But I dont think these fit my usecase as per my understanding of the above. Here are the details on my usecase:

Data from input stream: (containing data with timestamps for each minute)

{11:00, Data1}
{11:01, Data2}
{11:00, Data3}
{11:00, Data4}
{11:01, Data5}
...

Now, I want to process the window (Tumbling event time: 1min) for 11:00-11:01 containing [Data1, Data3, Data4] exactly 20 seconds after the first event with 11:00 timestamp arrived. Similarly, the next window from 11:01-11:02 containing [Data2, Data5] needs to be executed 20s post the first event with 11:01 timestamp comes in. Is this kind of watermark strategy possible in Flink?

1

1 Answers

0
votes

Here's an approach for implementing this:

In the onEvent method keep track of the largest timestamp seen so far. And whenever you update this variable, record the current system time.

Then when onPeriodicEmit is called (by default this will be called every 200 msec), if it has been 20 seconds since the current max timestamp was updated, emit a watermark equal to the current max timestamp plus 1 second.