0
votes

I have a streaming job that:

reads from Kafka --> maps events to some other DataStream --> key by(0) --> reduces a time window of 15 seconds processing time and writes back to a Redis sink.

When starting up, everything works great. The problem is, that after a while, the disk space get's full by what I think are links checkpoints.

My question is, are the checkpoints supposed to be cleaned/deleted while the link job is running? could not find any resources on this.

I'm using a filesystem backend that writes to /tmp (no hdfs setup)

2
After how much time do you run out of disk space?Robert Metzger

2 Answers

2
votes

Flink cleans up checkpoint files while it is running. There were some corner cases where it "forgot" to clean up all files in case of system failures. But for Flink 1.3 the community is working on fixing all these issues.

In your case, I'm assuming that you don't have enough disk space to store the data of your windows on disk.

0
votes

Checkpoints are by default not persisted externally and are only used to resume a job from failures. They are deleted when a program is cancelled.

If you are taking externalized checkpoints, then it has two policy

ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION: Retain the externalized checkpoint when the job is cancelled. Note that you have to manually clean up the checkpoint state after cancellation in this case.

ExternalizedCheckpointCleanup.DELETE_ON_CANCELLATION: Delete the externalized checkpoint when the job is cancelled. The checkpoint state will only be available if the job fails.

For more details https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/state/checkpoints.html