0
votes

We are building a stream processing pipeline to process/ingest Kafka messages. And we are using Flink v1.12.2. While defining a source watermark strategy, in the official documentation, I came across two out-of-the-box watermark strategies; forBoundedOutOfOrderness and forMonotonousTimestamps. I did go through javadoc, but did not fully understand when and why you should use one strategy over the other. Timestamps are based on event-time. Thanks.

1

1 Answers

1
votes

You should use forMonotonousTimestamps if the timestamps are never out-of-order, or if you are willing for all out-of-order events to be considered late. On the other hand, if out-of-order timestamps are normal for your application, then you should use forBoundedOutOfOrderness.

With Kafka, if you are having the kafka source operator apply the watermark strategy (recommended), then it will apply the strategy to each partition separately. In that case, each instance of the Kafka source will produce watermarks that are the minimum of the per-partition watermarks (for the partitions handled by that instance). In this case, you can use forMonotonousTimestamps if the timestamps are in-order within each partition (which will be the case, for example, if you consuming from a producer that is using log-append timestamps).

You want to use forMonotonousTimestamps whenever possible, since it minimizes latency and simplifies things.