0
votes

Flink documentation says "When running a highly available YARN cluster, we don’t run multiple JobManager (ApplicationMaster) instances, but only one, which is restarted by YARN on failures.". Then down below "high-availability: zookeeper".

I don't have experience with yarn, but why do we need to setup zookeeper if Yarn takes care of the restarts and we only have one JobManager? Or is this the zookeeper for resource manager(s)?

2

2 Answers

1
votes

To insure "high-availability", a Zookeeper-based implementation of YARN is often recommended. With YARN, only one instance of the RessourceManager runs, a Zookeeper based implementation provides high availibility to the RessourceManager, which allows a failover of the RessourceManager to another instance when the active one crashes.

This implementation works by storing the current internal state of the RessourceManager in Zookeeper.

Source : Apache Zookeeper Essentials, Saurav Haloi

0
votes
  1. YARN itself could be able to restart a new ApplicationMaster container automatically, since ApplicationMaster and JobManager are run in the same process, so that JobManager could be able to restart automatically.

  2. Zookeeper here is used to recover the state of previous down Job Manager, such as the checkpoint information.