0
votes

We want to use Apache Flink with RocksDB backend (HDFS) for stateful stream processing. However, our application state (keyed state) will be in the order of terabytes.

From what I understand, when we restore a job from a savepoint, all the operator state data will be shipped from the savepoint location on HDFS to each of the task managers. If the state is in the order of terabytes, then every deployment will result in a very long downtime if all this state needs to be transferred.

I wanted to understand, if in the case of RocksDB, it is possible to configure lazy loading, wherein keyed state is retrieved from HDFS as and when required, and then cached on the local disk.

Thank you!

1

1 Answers

2
votes

If you are using RocksDB and configure your Flink cluster to use local recovery, which you can read about here, then a copy of the RocksDB files will be kept on each task manager's local disk, and recovery will be almost immediate (except for any new nodes that have to be spun up).

However, this doesn't really apply to savepoints, as this mechanism requires incremental snapshots to really work well.

You may want to read that whole page of the docs, which is about how to configure and tune applications that use large amounts of state.