Flink Job Cluster Kubernetes restoring from savepoint

Question

We're currently running flink on kubernetes as a job cluster using this helm template: https://github.com/docker-flink/examples/tree/master/helm/flink ( with some added configuration ).

If I want to shutdown the cluster, re-deploy a new image ( due to application code update ) and re-start, how would I go about restoring from a savepoint?

The jobmanager command is strictly set on the standalone-job.sh command and if I add a savepoint in the parameter in the deployments k8s resource, then if flink restarts ( due to some system error ), it will always restart from that savepoint, which is not what we want.

Is there a way to restore from the latest savepoint and if that savepoint doesn't exist, it will just start fresh with the kubernetes job cluster helm configuration?

Alioza Alioza · Accepted Answer · 2020-06-12T07:49:44

I don't think I understand your full setup, but I am reading from your question that you have your Flink cluster & job recovery script in the standalone-job.sh.

You can create savepoints regularly and update a configuration with the latest savepoint id.

Your Flink recovery script should not point to a particular savepoint, but rather to the configuration value which will always contain the latest savepoint id.

Depending on the changes to your Flink jobs, recovery from savepoints will not always be possible, so you need to account for that case as well.

Flink Job Cluster Kubernetes restoring from savepoint

1 Answers