I'm working on installing a three node kubernetes cluster on a CentOS 7 with flannel for a some time, however the CoreDNS pods cannot connect to API server and constantly restarting.
The reference HowTo document I followed is here.
What Have I Done so Far?
- Disabled SELinux,
- Disabled
firewalld, - Enabled
br_netfilter,bridge-nf-call-iptables, - Installed kubernetes on three nodes, set-up master's pod network with flannel default network (
10.244.0.0/16), - Installed other two nodes, and joined the master.
- Deployed flannel,
- Configured Docker's BIP to use flannel default per-node subnet and network.
Current State
- The kubelet works and the cluster reports nodes as ready.
- The Cluster can schedule and migrate pods, so CoreDNS are spawned on nodes.
- Flannel network is connected. No logs in containers and I can ping
10.244.0.0/24networks from node to node. - Kubernetes can deploy and run arbitrary pods (Tried shell demo, and can access its shell via
kubectleven if the container is on a different node.- However, since DNS is not working, they cannot resolve any IP addresses.
What is the Problem?
CoreDNS pods report that they cannot connect to API server with error:
Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: no route to hostI cannot see
10.96.0.0routes in routing tables:default via 172.16.0.1 dev eth0 proto static metric 100 10.1.0.0/24 dev eth1 proto kernel scope link src 10.1.0.202 metric 101 10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink 10.244.1.0/24 dev docker0 proto kernel scope link src 10.244.1.1 10.244.1.0/24 dev cni0 proto kernel scope link src 10.244.1.1 10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink 172.16.0.0/16 dev eth0 proto kernel scope link src 172.16.0.202 metric 100
Additional Info
- Cluster init is done with the command
kubeadm init --apiserver-advertise-address=172.16.0.201 --pod-network-cidr=10.244.0.0/16. - I have torn down the cluster and rebuilt with 1.12.0 The problem still persists.
- The workaround in Kubernetes documentation doesn't work.
- Problem is present and same both with
1.11-3and1.12-0CentOS7 packages.
Progress so Far
- Downgraded Kubernetes to
1.11.3-0. - Re-initialized Kubernetes with
kubeadm init --apiserver-advertise-address=172.16.0.201 --pod-network-cidr=10.244.0.0/16, since the server has another external IP which cannot be accessed via other hosts, and Kubernetes tends to select that IP as API Server IP.--pod-network-cidris mandated by flannel. Resulting
iptables -Loutput after initialization with no joined nodesChain INPUT (policy ACCEPT) target prot opt source destination KUBE-EXTERNAL-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes externally-visible service portals */ KUBE-FIREWALL all -- anywhere anywhere Chain FORWARD (policy ACCEPT) target prot opt source destination KUBE-FORWARD all -- anywhere anywhere /* kubernetes forwarding rules */ DOCKER-USER all -- anywhere anywhere Chain OUTPUT (policy ACCEPT) target prot opt source destination KUBE-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes service portals */ KUBE-FIREWALL all -- anywhere anywhere Chain DOCKER-USER (1 references) target prot opt source destination RETURN all -- anywhere anywhere Chain KUBE-EXTERNAL-SERVICES (1 references) target prot opt source destination Chain KUBE-FIREWALL (2 references) target prot opt source destination DROP all -- anywhere anywhere /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000 Chain KUBE-FORWARD (1 references) target prot opt source destination ACCEPT all -- anywhere anywhere /* kubernetes forwarding rules */ mark match 0x4000/0x4000 Chain KUBE-SERVICES (1 references) target prot opt source destination REJECT udp -- anywhere 10.96.0.10 /* kube-system/kube-dns:dns has no endpoints */ udp dpt:domain reject-with icmp-port-unreachable REJECT tcp -- anywhere 10.96.0.10 /* kube-system/kube-dns:dns-tcp has no endpoints */ tcp dpt:domain reject-with icmp-port-unreachableLooks like API Server is deployed as it should
$ kubectl get svc kubernetes -o=yaml apiVersion: v1 kind: Service metadata: creationTimestamp: 2018-10-25T06:58:46Z labels: component: apiserver provider: kubernetes name: kubernetes namespace: default resourceVersion: "6" selfLink: /api/v1/namespaces/default/services/kubernetes uid: 6b3e4099-d823-11e8-8264-a6f3f1f622f3 spec: clusterIP: 10.96.0.1 ports: - name: https port: 443 protocol: TCP targetPort: 6443 sessionAffinity: None type: ClusterIP status: loadBalancer: {}Then I've applied flannel network pod with
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.ymlAs soon as I apply the flannel network, CoreDNS pods start and start to give the same error:
Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500\u0026resourceVersion=0: dial tcp 10.96.0.1:443: connect: no route to hostI've found out that
flanneldis using the wrong network interface, and changed it in thekube-flannel.ymlfile before deployment. However the outcome is still the same.
Any help is greatly appreciated.