After uninstalling calico, new pods are stuck in container creating state

Question

After uninstalling calico, kubectl -f calico.yaml, not able to create new pods in the cluster. Any new pods in the cluster are stuck in container creating state. Kubectl describe shows the errors below:

Warning FailedCreatePodSandBox 2m kubelet, 10.0.12.2 Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "f15743177fd70c5eabf70c60be5b8b354e5346837d1b5d59bf99d1d1b5d6416c" network for pod "test-9465-768b57b5df-fv9d4": NetworkPlugin cni failed to set up pod "test-9465-768b57b5df-fv9d4_policy-demo" network: error getting ClusterInformation: connection is unauthorized: Unauthorized, failed to clean up sandbox container "f15743177fd70c5eabf70c60be5b8b354e5346837d1b5d59bf99d1d1b5d6416c" network for pod "test-9465-768b57b5df-fv9d4": NetworkPlugin cni failed to teardown pod "test-9465-768b57b5df-fv9d4_policy-demo" network: error getting ClusterInformation: connection is unauthorized: Unauthorized]

qstack qstack · Accepted Answer · 2020-05-08T05:56:24

The main issue is caused because calico has an init container but does not have a cleanup container. T

To undeploy calico, we have to do the usual kubectl delete -f <yaml>, and then delete a calico conf file in each of the nodes /etc/cni/net.d/. This configuration file along with other binaries are loaded on to the host by the init container.

https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/

From this link we can see that kubelet reads the configuration file from the default directory, and if there are multiple configuration files, then it applies the CNI plugin from the config file that appears first in an alphabetical order (why, oh god why??).

So, in our case, after uninstalling calico, it would be removed from all the admin privileges but the nodes would still try to apply calico rules based upon the config file it picked up from the default directory. Then restart the node to get rid of the iptable rules.

Removing the file and restarting the node solves the issue and we get back to normal behavior. Another way to solve the same problem is by simply terminating the node from the cluster if you are on a managed kubernetes cluster. Since, public cloud infrastructure automatically boots up another node to keep the same state, it no longer has the calico configuration file.

After uninstalling calico, new pods are stuck in container creating state

1 Answers