Apache Flink: Why to choose the MemoryStateBackend over the FsStateBackend?

Question

Flink has a MemoryStateBackend and a FsStateBackend (and a RocksDBStateBackend). Both seem to extend the HeapKeyedStateBackend, i.e. the mechanism for storing the current working state is entirely the same.

This SO answer says that the main difference lies in the MemoryStateBackend keeping a copy of the checkpoints in the JobManagers memory. (I wasn't able to glean any evidence for that from the source code.) The MemoryStateBackend also limits the maximum state size per subtask.

Now I wonder: Why would you ever want to use the MemoryStateBackend?

At first I thought that it may be possible to take savepoints without a destination path with the MemoryStateBackend, but nope. — Caesar
I believe you can find your answer in this blog: da-platform.com/blog/… — Jiayi Liao
Not really. It just says "Use it for development." But not why. (But thank you, that is already a better explanation than the official docs.) — Caesar
When state is small, it is convenient for development and testing, because you don't have to think about the filesystem. — David Anderson

Fabian Hueske Fabian Hueske · Accepted Answer · 2019-01-23T09:48:35

As you said, both MemoryStateBackend and FSStateBackend are based on HeapKeyedStateBackend. This means, that both state backends maintain the state of an operator as regular objects on the JVM heap of the TaskManager, i.e., state is always accessed in memory.

The backends differ in how they persist the state for checkpoints. A checkpoint is a copy of the state of all operators of an application that is stored somewhere. In case of a failure, the application is restarted and the state of the operators is initialized from the checkpoint.

The FSStateBackend stores the checkpoint in a file system, typically HDFS, S3, or a NFS that is mounted on all worker nodes. The MemoryStateBackend stores the state in the JVM of the JobManager. This has the following pros and cons:

Pros:

No need to setup a (distributed) file system.
No need to configure a storage location.

Cons:

State is lost if the JobManager process dies.
Size of state is bound by the size of the JobManager memory.

Since checkpoints are lost if the JM goes down, the MemoryStateBackend is unsuitable for most production use cases. It can be useful for developing and testing stateful applications, because it requires not configuration or setup.

Apache Flink: Why to choose the MemoryStateBackend over the FsStateBackend?

1 Answers