0
votes

I need to know how Apache Flink restore its state from checkpoint, because I can't see any difference between the time of start and seeing first event in operator when running pure job verses restoring from savepoint.

Does state load lazily from checkpoint/savepoint?

1
Could you elaborate a bit more on your use case? In general the state is loaded on job start with savepoint. - Dawid Wysakowicz
@DawidWysakowicz does it load all of the states from savepoint as it starts or it loads them as needs them(lazy loading)? - Moein Hosseini
As @alpinegizmo said it depends on the state backend chosen. RocksDB keeps the state in bytes all the time and serializes/deserializes on access. HeapStateBackend deserializes on job start. - Dawid Wysakowicz
@DawidWysakowicz what about filesystems? - Moein Hosseini

1 Answers

3
votes

The keyed state interfaces are designed to make this distinction transparent. As Dawid mentioned, the state is loaded during job start. Note that what it means to load the state depends on which state backend is being used.

In the case of operator state the CheckpointedFunction interface has this method

public void initializeState(FunctionInitializationContext context)

where the context has an isRestored() method that lets you know if you are recovering from a failure. See the docs on managed operator state for more details, including an example.