0
votes

I'm using flink 1.11 with RocksDBStateBackend, the code looks like this:

RocksDBStateBackend stateBackend = new RocksDBStateBackend("hdfs:///flink-checkpoints", true);
stateBackend.setDbStoragePath(config.getString("/tmp/rocksdb/"));
env.setStateBackend(stateBackend);

My questions are:

  1. My understanding is that when DbStoragePath is set, Flink will put all checkpoints and state in a local disk (in my case /tmp/rocksdb) before storing into hadoop hdfs:///flink-checkpoints. Is that right? And if it's right, should I always set DbStoragePath for better performance?
  2. Because Flink doesn't delete old checkpoints, I have a job periodically clean up old checkpoints. But I'm not sure is it safe to do that if I set incremental checkpoints?
1

1 Answers

1
votes

The DbStoragePath is the location on the local disk where RocksDB keeps its working state. By default the tmp directory will be used. Ideally this should be fastest available disk -- e.g., SSD. Normally this is configured via state.backend.rocksdb.localdir.

If you are using incremental checkpoints, then the SST files from the DbStoragePath are copied to the state.checkpoints.dir. Otherwise full snapshots are written to the checkpoint directory and the DbStoragePath isn't involved.

Flink automatically deletes old checkpoints, except after canceling a job that is using retained checkpoints. It's not obvious how to safely delete an incremental, retained checkpoint -- you need to somehow know if any of those SST files are still referred to from the latest checkpoint. You might ask for advice on the user mailing list.