I am running an OCP4.6 with RHEL7.8 BareMetal compute nodes. We are running functionality and HA testing on the cluster. Our main application on this cluster is a StatefulSet with around 250 pods.
After shutting down a node, the pods running on the node entered a Terminating
state, and are stuck there.
Since this is a StatefulSet, pods cannot restart on another node until the original pod finishes terminating.
I can delete the pods with --force --grace-period=0
but this does not solve my issue.
These pods only terminate after the server that was shut down returns to Ready
status.
Any ideas??
UPDATE:
Looking at k8s' docs - I found that the fact a StatefulSet pod doesn't terminate after a node shuts down is actually a saftey mechanism, and is in fact a feature: https://kubernetes.io/docs/tasks/run-application/force-delete-stateful-set-pod/
kubectl describe
them? – Wytrzymały Wiktor