0
votes

I have issues with CoreDNS on some nodes are in Crashloopback state due to error trying to reach the kubernetes internal service.

This is a new K8s cluster deployed using Kubespray, the network layer is Weave with Kubernetes version 1.12.5 on Openstack. I've already tested the connection to the endpoints and have no issue reaching to 10.2.70.14:6443 for example. But telnet from the pods to 10.233.0.1:443 is failing.

Thanks in advance for the help

kubectl describe svc kubernetes
Name:              kubernetes
Namespace:         default
Labels:            component=apiserver
                   provider=kubernetes
Annotations:       <none>
Selector:          <none>
Type:              ClusterIP
IP:                10.233.0.1
Port:              https  443/TCP
TargetPort:        6443/TCP
Endpoints:         10.2.70.14:6443,10.2.70.18:6443,10.2.70.27:6443 + 2 more...
Session Affinity:  None
Events:            <none>

And from CoreDNS logs:

E0415 17:47:05.453762       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:311: Failed to list *v1.Service: Get https://10.233.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.233.0.1:443: connect: connection refused
E0415 17:47:05.456909       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:313: Failed to list *v1.Endpoints: Get https://10.233.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.233.0.1:443: connect: connection refused
E0415 17:47:06.453258       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:318: Failed to list *v1.Namespace: Get https://10.233.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.233.0.1:443: connect: connection refused

Also, checking out the logs of kube-proxy from one of the problematic nodes revealed the following errors:

I0415 19:14:32.162909       1 graceful_termination.go:160] Trying to delete rs: 10.233.0.1:443/TCP/10.2.70.36:6443
I0415 19:14:32.162979       1 graceful_termination.go:171] Not deleting, RS 10.233.0.1:443/TCP/10.2.70.36:6443: 1 ActiveConn, 0 InactiveConn
I0415 19:14:32.162989       1 graceful_termination.go:160] Trying to delete rs: 10.233.0.1:443/TCP/10.2.70.18:6443
I0415 19:14:32.163017       1 graceful_termination.go:171] Not deleting, RS 10.233.0.1:443/TCP/10.2.70.18:6443: 1 ActiveConn, 0 InactiveConn
E0415 19:14:32.215707       1 proxier.go:430] Failed to execute iptables-restore for nat: exit status 1 (iptables-restore: line 7 failed
)
1
Can you please check on the master server by kubectl get pods -all-namespaces. Please check the STATUS of the coredns- pods. If the STATUS is ContainerCreating you may have to delete them, causing new ones to be generated.Aamir M Meman
The status is Crashloopback for coredns, none of my pods are in ContainerCreatingTomer Leibovich
I have exactly the same problem? How did you solve this?whymatter
Got it, added my solution as answerwhymatter

1 Answers

1
votes

I had exactly the same problem, and it turned out that my kubespray config was wrong. Especially the nginx ingress setting ingress_nginx_host_network

As it turns our you have to set ingress_nginx_host_network: true (defaults to false)

If you do not want to rerun the whole kubespray script, edit the nginx ingress deamon set

$ kubectl -n ingress-nginx edit ds ingress-nginx-controller

  1. Add --report-node-internal-ip-address to the commandline:
spec:
  container:
      args:
       - /nginx-ingress-controller
       - --configmap=$(POD_NAMESPACE)/ingress-nginx
       - --tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
       - --udp-services-configmap=$(POD_NAMESPACE)/udp-services
       - --annotations-prefix=nginx.ingress.kubernetes.io
       - --report-node-internal-ip-address # <- new
  1. Set the following two properties on the same level as e.g serviceAccountName: ingress-nginx:
serviceAccountName: ingress-nginx
hostNetwork: true # <- new
dnsPolicy: ClusterFirstWithHostNet  # <- new

Then save and quit :wq, check the pod status kubectl get pods --all-namespaces.

Source: https://github.com/kubernetes-sigs/kubespray/issues/4357