4
votes

We are using periodic event time window with watermark. We have currently 4 parallel tasks in our Flink App.

During the streaming process, all the 4 tasks' watermark values must be close to trigger window event.

For example;

Task 1 watermark value = 8

Task 2 watermark value = 1

Task 3 watermark value = 8

Task 4 watermark value = 8

Task 2 is waiting for log to update its watermark. However, the condition can occur before Task 2's update and we want to fire the window event before it.

Is there any mechanism to align all the parallel tasks' watermarks or fire the window event without waiting for other tasks?

1
I does not make much sense to align watermarks across parallel tasks. The idea of watermarks is to signal that no more elements with a lower timestamp can arrive. If the one of your tasks hasn't seen a watermark of value 8 then this means that elements with a timestamp with 2-8 can still arrive. Maybe you should switch to processing time to achieve your goal. But here again, there is no synchronization between the individual tasks running on different machines.Till Rohrmann
Hi @TillRohrmann , thank you for your answer. We have a scenario like this: Suppose that there are messages. We keyBy these messages by their sender. If there are 3 senders which are same person, we produce alert. In the example above, we can not produce alert because to produce alert the task which has watermark 2 needs to be updated. Because of the second task, window will not fired until a new log.Ozan Deniz
I think you got the concept of keyBy a bit wrong. If you want to look for 3 senders which are the same person, then you should keyBy the person and then simply use a count window to generate the alerts. Maybe you also wanna filter out duplicate senders for the same person.Till Rohrmann
But we need to consider also the event time. For example 3 senders in 10 minutes(event time not process time).Ozan Deniz
Then you can use an event time window with a custom trigger which fires when it has seen 3 elements.Till Rohrmann

1 Answers

0
votes

This was already answered in the comments by @Til Rohrmann, main answer:

If you want to look for 3 senders which are the same person, then you should keyBy the person and then simply use a count window to generate the alerts. Maybe you also wanna filter out duplicate senders for the same person.

Followup question:

But we need to consider also the event time. For example 3 senders in 10 minutes(event time not process time)

Followup answer:

Then you can use an event time window with a custom trigger which fires when it has seen 3 elements.

I suppose the critical conclusion is: If you want to trigger a count of something, keyBy that field.