2
votes

Is it possible to trigger checkpoint from Flink streaming job?

My use case is that: I have two streams R and S to join with tumbling time windows. The source is Kafka. I use event time processing and BoundedOutOfOrdernessGenerator to make sure events from two streams end up in the same window.

The problem is my states are large and a regular periodic checkpoint takes too much time sometimes. At first, I wanted to disable checkpointing and rely on Kafka offset. But out of orderness means I have already some data in future windows from current offset. So I need checkpointing.

If it was possible to trigger checkpoints after a window gets cleaned instead of periodic ones it would be more efficient. Maybe at evictAfter method.

Does that make sense and is it possible? IF not I'd appreciate a work around.

1
in the Flink environment you can try to reduce the checkpoint interval. Have you seen 1.2 release notes? ci.apache.org/projects/flink/flink-docs-release-1.2/dev/stream/…Jared Hooper
I don't see how that helps. Even if I take checkppints less frequently, they are still going to be large. I want to trigger checkpoints when I have least amount of events in operators for efficiency.yolgun
more frequently. Reducing the interval would make the checkpoints smallerJared Hooper

1 Answers

1
votes

Seems the issue here is checkpoint efficiency. Consider using the RocksDB state backend with incremental checkpoints, discussed in the docs under Debugging and Tuning Checkpoints and Large State.