1
votes

Recovery with JobManager is achieved using Zookeeper, but what if TaskManager gets failed? How to recover from this, does JobManager automatically recovers TaskManagers?

1

1 Answers

3
votes

In general, the JobManager takes care to recover from TaskManager failures. How this is done depends on your setup.

  • If you run Flink on YARN, the JobManager will start a new TaskManager when it realizes that a TaskManager has died and reassign tasks.
  • If you run Flink stand-alone on a cluster, you have to make sure you have one (or more) stand-by TaskManager(s) running. The JobManager will assign the tasks of the failed TM to a stand-by TM. This also means that you have to ensure that enough stand-by TMs are up and running.