I deployed a brand new k8s cluster using kubespray, everything works fine but all of the calico related pods are not ready. And after many hours of debugging I couldn't find the reason why calico pods are crashing. I even disabled/stopped the entire firewalld service but nothing changed.
One other important thing is that calicoctl node status
output is not stable and every time gets called show something different:
Calico process is not running.
Calico process is running.
None of the BGP backend processes (BIRD or GoBGP) are running.
Calico process is running.
IPv4 BGP status
+----------------+-------------------+-------+----------+---------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+----------------+-------------------+-------+----------+---------+
| 192.168.231.42 | node-to-node mesh | start | 06:23:41 | Passive |
+----------------+-------------------+-------+----------+---------+
IPv6 BGP status
No IPv6 peers found.
Another log that shown up often is the following message:
bird: Unable to open configuration file /etc/calico/confd/config/bird.cfg: No such file or directory
bird: Unable to open configuration file /etc/calico/confd/config/bird6.cfg: No such file or directory
Also tried changing IP_AUTODETECTION_METHOD with each of the following but nothing changed:
kubectl set env daemonset/calico-node -n kube-system IP_AUTODETECTION_METHOD=can-reach=www.google.com
kubectl set env daemonset/calico-node -n kube-system IP_AUTODETECTION_METHOD=can-reach=8.8.8.8
kubectl set env daemonset/calico-node -n kube-system IP_AUTODETECTION_METHOD=interface=eth1
kubectl set env daemonset/calico-node -n kube-system IP_AUTODETECTION_METHOD=interface=eth.*
Expected Behavior
All pods, daemonset, deployment and replicaset related to calico should be in READY state.
Current Behavior
All pods, daemonset, deployment and replicaset related to calico is in NOT READY state.
Possible Solution
Nothing yet, I am asking for help on how to debug / overcome this issue.
Steps to Reproduce (for bugs)
Its the latest version of kubespray with the following Context & Environment.
git reflog
7e4b176 HEAD@{0}: clone: from https://github.com/kubernetes-sigs/kubespray.git
Context
I'm trying to deploy a k8s cluster which has one master and one worker node. Also note that the servers taking part in this cluster are located in an almost airgapped/offline enviroment with limited access to global internet, of course the ansible process of deploying cluster using kubespray was successful but I'm facing this issue with calico pods.
Your Environment
cat inventory/mycluster/hosts.yaml
all:
hosts:
node1:
ansible_host: 192.168.231.41
ansible_port: 32244
ip: 192.168.231.41
access_ip: 192.168.231.41
node2:
ansible_host: 192.168.231.42
ansible_port: 32244
ip: 192.168.231.42
access_ip: 192.168.231.42
children:
kube_control_plane:
hosts:
node1:
kube_node:
hosts:
node1:
node2:
etcd:
hosts:
node1:
k8s_cluster:
children:
kube_control_plane:
kube_node:
calico_rr:
hosts: {}
calicoctl version
Client Version: v3.19.2
Git commit: 6f3d4900
Cluster Version: v3.19.2
Cluster Type: kubespray,bgp,kubeadm,kdd,k8s
cat /etc/centos-release
CentOS Linux release 7.9.2009 (Core)
uname -r
3.10.0-1160.42.2.el7.x86_64
kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.4", GitCommit:"3cce4a82b44f032d0cd1a1790e6d2f5a55d20aae", GitTreeState:"clean", BuildDate:"2021-08-11T18:16:05Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.4", GitCommit:"3cce4a82b44f032d0cd1a1790e6d2f5a55d20aae", GitTreeState:"clean", BuildDate:"2021-08-11T18:10:22Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
node1 Ready control-plane,master 19h v1.21.4 192.168.231.41 <none> CentOS Linux 7 (Core) 3.10.0-1160.42.2.el7.x86_64 docker://20.10.8
node2 Ready <none> 19h v1.21.4 192.168.231.42 <none> CentOS Linux 7 (Core) 3.10.0-1160.42.2.el7.x86_64 docker://20.10.8
kubectl get all --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system pod/calico-kube-controllers-8575b76f66-57zw4 0/1 CrashLoopBackOff 327 19h 192.168.231.42 node2 <none> <none>
kube-system pod/calico-node-4hkzb 0/1 Running 245 14h 192.168.231.42 node2 <none> <none>
kube-system pod/calico-node-hznhc 0/1 Running 245 14h 192.168.231.41 node1 <none> <none>
kube-system pod/coredns-8474476ff8-b6lqz 1/1 Running 0 19h 10.233.96.1 node2 <none> <none>
kube-system pod/coredns-8474476ff8-gdkml 1/1 Running 0 19h 10.233.90.1 node1 <none> <none>
kube-system pod/dns-autoscaler-7df78bfcfb-xnn4r 1/1 Running 0 19h 10.233.90.2 node1 <none> <none>
kube-system pod/kube-apiserver-node1 1/1 Running 0 19h 192.168.231.41 node1 <none> <none>
kube-system pod/kube-controller-manager-node1 1/1 Running 0 19h 192.168.231.41 node1 <none> <none>
kube-system pod/kube-proxy-dmw22 1/1 Running 0 19h 192.168.231.41 node1 <none> <none>
kube-system pod/kube-proxy-wzpnv 1/1 Running 0 19h 192.168.231.42 node2 <none> <none>
kube-system pod/kube-scheduler-node1 1/1 Running 0 19h 192.168.231.41 node1 <none> <none>
kube-system pod/nginx-proxy-node2 1/1 Running 0 19h 192.168.231.42 node2 <none> <none>
kube-system pod/nodelocaldns-6h5q2 1/1 Running 0 19h 192.168.231.42 node2 <none> <none>
kube-system pod/nodelocaldns-7fwbd 1/1 Running 0 19h 192.168.231.41 node1 <none> <none>
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default service/kubernetes ClusterIP 10.233.0.1 <none> 443/TCP 19h <none>
kube-system service/coredns ClusterIP 10.233.0.3 <none> 53/UDP,53/TCP,9153/TCP 19h k8s-app=kube-dns
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
kube-system daemonset.apps/calico-node 2 2 0 2 0 kubernetes.io/os=linux 19h calico-node quay.io/calico/node:v3.19.2 k8s-app=calico-node
kube-system daemonset.apps/kube-proxy 2 2 2 2 2 kubernetes.io/os=linux 19h kube-proxy k8s.gcr.io/kube-proxy:v1.21.4 k8s-app=kube-proxy
kube-system daemonset.apps/nodelocaldns 2 2 2 2 2 kubernetes.io/os=linux 19h node-cache k8s.gcr.io/dns/k8s-dns-node-cache:1.17.1 k8s-app=nodelocaldns
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
kube-system deployment.apps/calico-kube-controllers 0/1 1 0 19h calico-kube-controllers quay.io/calico/kube-controllers:v3.19.2 k8s-app=calico-kube-controllers
kube-system deployment.apps/coredns 2/2 2 2 19h coredns k8s.gcr.io/coredns/coredns:v1.8.0 k8s-app=kube-dns
kube-system deployment.apps/dns-autoscaler 1/1 1 1 19h autoscaler k8s.gcr.io/cpa/cluster-proportional-autoscaler-amd64:1.8.3 k8s-app=dns-autoscaler
NAMESPACE NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
kube-system replicaset.apps/calico-kube-controllers-8575b76f66 1 1 0 19h calico-kube-controllers quay.io/calico/kube-controllers:v3.19.2 k8s-app=calico-kube-controllers,pod-template-hash=8575b76f66
kube-system replicaset.apps/coredns-8474476ff8 2 2 2 19h coredns k8s.gcr.io/coredns/coredns:v1.8.0 k8s-app=kube-dns,pod-template-hash=8474476ff8
kube-system replicaset.apps/dns-autoscaler-7df78bfcfb 1 1 1 19h autoscaler k8s.gcr.io/cpa/cluster-proportional-autoscaler-amd64:1.8.3 k8s-app=dns-autoscaler,pod-template-hash=7df78bfcfb