I have a situation to do sliding count over large scale of messages using State
and TimeService
. The sliding size is one and the window size is larger than 10 hours. The problem I meet is the checkpointing takes a lot of time. In order to improve the performance we use the incremental checkpoints. But it is still slow when the system do the checkpoint. We figure out that the most of the time is used to serialize the timers which are used to clean data. We have a timer for each key and there are about 300M timers at all.
Any suggestion to solve this problem would be appreciated. Or we can do the count in another way?
————————————————————————————————————————————
I'd like to add some details to the situation. The sliding size is one event and the window size is more than 10 hours(There are about 300 events per second), we need to react on each event. So in this situation we did not use the windows provided by Flink. we use the keyed state
to store the previous information instead. The timers
is used in ProcessFunction
to trigger the cleaning job of the old data. At last the number of the dinstinct keys is very large.