Currently I using FsStateBackend for checkpointing state. I'm using interval 10s like code below. But I see the cost of transfer bucket that using checkpoint is approximately 20$/ day and aws transfer s3 pricing: $0.005/1000 requests => (I'm using ~4000000 requests/day @@). I have 7 jobs, which:
- 6 jobs using checkpoint interval = 10000 (ms)
- 1 job using checkpoint interval = 1000 (ms)
And run flink on AWS EMR. Average state size for each checkpoint from (8KB -> 30M). What happened behind checkpoint?
// set up checkpoint
env.enableCheckpointing(1000 or 10000);
// advanced options:
// make sure 500 ms of progress happen between checkpoints
env.getCheckpointConfig().setMinPauseBetweenCheckpoints(500);
// checkpoints have to complete within one minute, or are discarded
// env.getCheckpointConfig().setCheckpointTimeout(60000);
// allow only one checkpoint to be in progress at the same time
env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);
// enable externalized checkpoints which are retained after job cancellation
env.getCheckpointConfig().enableExternalizedCheckpoints(
CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
// folder to checkpoint
StateBackend backend = new FsStateBackend(checkpointPath, true);
env.setStateBackend(backend);