Kubernetes Pod backoff failure policy
From the k8s documentation:
There are situations where you want to fail a Job after some amount of retries due to a logical error in configuration etc. To do so, set .spec.backoffLimit to specify the number of retries before considering a Job as failed. The back-off limit is set by default to 6.
Spring cloud dataflow:
When a job has failed, we actually don't want a retry. In other words, we want to set the backoffLimit: 1
in our Sprint Cloud Dataflow config file.
We have tried to set it like the following:
deployer.kubernetes.spec.backoffLimit: 1
or even
deployer.kubernetes.backoffLimit: 1
But both is not transmitted to our Kubernetes Cluster.
After 6 tries, we see the following message:
status: conditions: - lastProbeTime: '2019-10-22T17:45:46Z' lastTransitionTime: '2019-10-22T17:45:46Z' message: Job has reached the specified backoff limit reason: BackoffLimitExceeded status: 'True' type: Failed failed: 6 startTime: '2019-10-22T17:33:01Z'
Actually we want to fail fast (1 or 2 tries maximum)
Question: How can we properly set this property, so that all task triggered by SCDF will fail maximum once on Kubernetes?
Update (23.10.2019)
We have also tried the property:
deployer:
kubernetes:
maxCrashLoopBackOffRestarts: Never # No retry for failed tasks
But the jobs are still failing 6 times instead of 1.
Update (26.10.2019)
For completeness sake:
- I am scheduling a task in SCDF
- The task is triggered on Kubernetes (more specifically Openshift)
- When I check the configuration on the K8s-platform, I see that it still has a backoffLimit of 6, instead of 1:
Yalm config snippet taken from the running pod:
spec:
backoffLimit: 6
completions: 1
parallelism: 1
In the official documentation, it says:
`maxCrashLoopBackOffRestarts` - Maximum allowed restarts for app that is in a CrashLoopBackOff. Values are `Always`, `IfNotPresent`, `Never`
But maxCrashLoopBackOffRestarts
takes an integer. So I guess the documentation is not accurate.
The pod is then restarted 6 times.
I have tried to set those properties unsuccessfully:
spring.cloud.dataflow.task.platform.kubernetes.accounts.defaults.maxCrashLoopBackOffRestarts: 0
spring.cloud.deployer.kubernetes.maxCrashLoopBackOffRestarts: 0
spring.cloud.scheduler.kubernetes.maxCrashLoopBackOffRestarts: 0
None of those has worked.
Any idea?