As you said, both MemoryStateBackend
and FSStateBackend
are based on HeapKeyedStateBackend
. This means, that both state backends maintain the state of an operator as regular objects on the JVM heap of the TaskManager, i.e., state is always accessed in memory.
The backends differ in how they persist the state for checkpoints. A checkpoint is a copy of the state of all operators of an application that is stored somewhere. In case of a failure, the application is restarted and the state of the operators is initialized from the checkpoint.
The FSStateBackend
stores the checkpoint in a file system, typically HDFS, S3, or a NFS that is mounted on all worker nodes. The MemoryStateBackend
stores the state in the JVM of the JobManager. This has the following pros and cons:
Pros:
- No need to setup a (distributed) file system.
- No need to configure a storage location.
Cons:
- State is lost if the JobManager process dies.
- Size of state is bound by the size of the JobManager memory.
Since checkpoints are lost if the JM goes down, the MemoryStateBackend
is unsuitable for most production use cases. It can be useful for developing and testing stateful applications, because it requires not configuration or setup.