I am attempting to recover my jobs and state when my job manager goes down and I haven't been able to restart my jobs successfully.
From my understanding, TaskManager recovery is aided by the JobManager (this works as expected) and JobManager recovery is completed through Zookeeper.
I am wondering if there is a way to recover the jobmanager without zookeeper?
I am using docker for my setup and all checkpoints & savepoints are persisted to mapped volumes.
Is flink able to recover when all job managers go down? I can afford to wait for the single JobManager to restart.
When I restart the jobmanager I get the following exception: org.apache.flink.runtime.rest.NotFoundException: Job 446f4392adc32f8e7ba405a474b49e32 not found
I have set the following in my flink-conf.yaml
state.backend: filesystem
state.checkpoints.dir: file:///opt/flink/checkpoints
state.savepoints.dir: file:///opt/flink/savepoints
I think my issue may that the JAR gets deleted when the job manager is restarted but I am not sure how to solve this.