0
votes

Is it possible to clear the current watermark in a DataStream?

Example input for a month-long watermark with no allowed lateness:

[
  { timestamp: '10/2018' },
  { timestamp: '11/2018' },
  { timestamp: '11/2018', clearState: true },
  { timestamp: '9/2018' }
]

Normally, the '9/2018' record would be thrown out as it is late. Is there a way to programmatically reset the watermark state when the clearState message is seen?

1
What are you trying to achieve? Window state will get discarded as soon as the watermark as passed so even if you could reset the watermark you would still have lost their state.gcandal

1 Answers

1
votes

Watermarks are not supposed to go backwards -- it's undefined what will happen, and in practice it's a bad idea. There are, however, various ways to accommodate late data.

If you are using the window API, Flink will clear any window state once the allowed lateness has expired for a window. If you want more control than this, consider using a ProcessFunction, which will allow/require you to manage state (and timers) explicitly.