kubernetes networking: pod cannot reach nodes

Question

I have kubernetes cluster with 3 masters and 7 workers. I use Calico as cni. When I deploy Calico, the calico-kube-controllers-xxx fails because it cannot reach 10.96.0.1:443.

2020-06-23 13:05:28.737 [INFO][1] main.go 88: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
W0623 13:05:28.740128       1 client_config.go:541] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2020-06-23 13:05:28.742 [INFO][1] main.go 109: Ensuring Calico datastore is initialized
2020-06-23 13:05:38.742 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-06-23 13:05:38.742 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded

this is the situation in the kube-system namespace:

kubectl get po -n kube-system
NAME                                       READY   STATUS             RESTARTS   AGE
calico-kube-controllers-77d6cbc65f-6bmjg   0/1     CrashLoopBackOff   56         4h33m
calico-node-94pkr                          1/1     Running            0          36m
calico-node-d8vc4                          1/1     Running            0          36m
calico-node-fgpd4                          1/1     Running            0          37m
calico-node-jqgkp                          1/1     Running            0          37m
calico-node-m9lds                          1/1     Running            0          37m
calico-node-n5qmb                          1/1     Running            0          37m
calico-node-t46jb                          1/1     Running            0          36m
calico-node-w6xch                          1/1     Running            0          38m
calico-node-xpz8k                          1/1     Running            0          37m
calico-node-zbw4x                          1/1     Running            0          36m
coredns-5644d7b6d9-ms7gv                   0/1     Running            0          4h33m
coredns-5644d7b6d9-thwlz                   0/1     Running            0          4h33m
kube-apiserver-k8s01                       1/1     Running            7          34d
kube-apiserver-k8s02                       1/1     Running            9          34d
kube-apiserver-k8s03                       1/1     Running            7          34d
kube-controller-manager-k8s01              1/1     Running            7          34d
kube-controller-manager-k8s02              1/1     Running            9          34d
kube-controller-manager-k8s03              1/1     Running            8          34d
kube-proxy-9dppr                           1/1     Running            3          4d
kube-proxy-9hhm9                           1/1     Running            3          4d
kube-proxy-9svfk                           1/1     Running            1          4d
kube-proxy-jctxm                           1/1     Running            3          4d
kube-proxy-lsg7m                           1/1     Running            3          4d
kube-proxy-m257r                           1/1     Running            1          4d
kube-proxy-qtbbz                           1/1     Running            2          4d
kube-proxy-v958j                           1/1     Running            2          4d
kube-proxy-x97qx                           1/1     Running            2          4d
kube-proxy-xjkjl                           1/1     Running            3          4d
kube-scheduler-k8s01                       1/1     Running            7          34d
kube-scheduler-k8s02                       1/1     Running            9          34d
kube-scheduler-k8s03                       1/1     Running            8          34d

Besides, also coredns cannot get internal kubernetes service.

Within a node, if I run wget -S 10.96.0.1:443, I receive a response.

wget -S 10.96.0.1:443
--2020-06-23 13:12:12--  http://10.96.0.1:443/
Connecting to 10.96.0.1:443... connected.
HTTP request sent, awaiting response...
  HTTP/1.0 400 Bad Request
2020-06-23 13:12:12 ERROR 400: Bad Request.

But, if I run wget -S 10.96.0.1:443 in a pod, I receive a timeout error.

Also, i cannot ping nodes from pods.

Cluster pod cidr is 192.168.0.0/16.

Is there any firewall that might block the connection? Or does your host have multiple NICs? It might happen that calico chose incorrect one. Could you post logs from coredns and kube-apiserver pod? — Mariusz K.

fsilletti fsilletti · Accepted Answer · 2020-06-24T12:42:59

I resolve recreating the cluster with different pod cidr

kubernetes networking: pod cannot reach nodes

1 Answers