0
votes

How do I troubleshoot this problem?

I have a manual setup of Kubernetes which is using as cluster internal DNS, coredns. A busybox pod has been deployed to do a nslookup on kubernetes.default.

The lookup fails with the message nslookup: can't resolve 'kubernetes.default. To get more insight what is happening during the lookup I checked the network traffic with tcpdump going out from my busybox pod. This shows that my pod can connect successfully to the coredns pod but the coredns pod will fail to connect back:

10:25:53.328153 IP 10.200.0.29.49598 > 10.32.0.10.domain: 2+ PTR? 10.0.32.10.in-addr.arpa. (41)
10:25:53.328393 IP 10.200.0.30.domain > 10.200.0.29.49598: 2* 1/0/0 PTR kube-dns.kube-system.svc.cluster.local. (93)
10:25:53.328410 IP 10.200.0.29 > 10.200.0.30: ICMP 10.200.0.29 udp port 49598 unreachable, length 129
10:25:58.328516 IP 10.200.0.29.50899 > 10.32.0.10.domain: 3+ PTR? 10.0.32.10.in-addr.arpa. (41)
10:25:58.328738 IP 10.200.0.30.domain > 10.200.0.29.50899: 3* 1/0/0 PTR kube-dns.kube-system.svc.cluster.local. (93)
10:25:58.328752 IP 10.200.0.29 > 10.200.0.30: ICMP 10.200.0.29 udp port 50899 unreachable, length 129
10:25:58.343205 ARP, Request who-has 10.200.0.1 tell 10.200.0.29, length 28
10:25:58.343217 ARP, Reply 10.200.0.1 is-at 0a:58:0a:c8:00:01 (oui Unknown), length 28
10:25:58.351250 ARP, Request who-has 10.200.0.29 tell 10.200.0.30, length 28
10:25:58.351250 ARP, Request who-has 10.200.0.30 tell 10.200.0.29, length 28
10:25:58.351261 ARP, Reply 10.200.0.29 is-at 0a:58:0a:c8:00:1d (oui Unknown), length 28
10:25:58.351262 ARP, Reply 10.200.0.30 is-at 0a:58:0a:c8:00:1e (oui Unknown), length 28
10:26:03.331409 IP 10.200.0.29.45823 > 10.32.0.10.domain: 4+ PTR? 10.0.32.10.in-addr.arpa. (41)
10:26:03.331618 IP 10.200.0.30.domain > 10.200.0.29.45823: 4* 1/0/0 PTR kube-dns.kube-system.svc.cluster.local. (93)
10:26:03.331631 IP 10.200.0.29 > 10.200.0.30: ICMP 10.200.0.29 udp port 45823 unreachable, length 129
10:26:08.348259 IP 10.200.0.29.43332 > 10.32.0.10.domain: 5+ PTR? 10.0.32.10.in-addr.arpa. (41)
10:26:08.348492 IP 10.200.0.30.domain > 10.200.0.29.43332: 5* 1/0/0 PTR kube-dns.kube-system.svc.cluster.local. (93)
10:26:08.348506 IP 10.200.0.29 > 10.200.0.30: ICMP 10.200.0.29 udp port 43332 unreachable, length 129
10:26:13.353491 IP 10.200.0.29.55715 > 10.32.0.10.domain: 6+ AAAA? kubernetes.default. (36)
10:26:13.354955 IP 10.200.0.30.domain > 10.200.0.29.55715: 6 NXDomain* 0/0/0 (36)
10:26:13.354971 IP 10.200.0.29 > 10.200.0.30: ICMP 10.200.0.29 udp port 55715 unreachable, length 72
10:26:18.354285 IP 10.200.0.29.57421 > 10.32.0.10.domain: 7+ AAAA? kubernetes.default. (36)
10:26:18.355533 IP 10.200.0.30.domain > 10.200.0.29.57421: 7 NXDomain* 0/0/0 (36)
10:26:18.355550 IP 10.200.0.29 > 10.200.0.30: ICMP 10.200.0.29 udp port 57421 unreachable, length 72
10:26:23.359405 IP 10.200.0.29.44332 > 10.32.0.10.domain: 8+ AAAA? kubernetes.default. (36)
10:26:23.361155 IP 10.200.0.30.domain > 10.200.0.29.44332: 8 NXDomain* 0/0/0 (36)
10:26:23.361171 IP 10.200.0.29 > 10.200.0.30: ICMP 10.200.0.29 udp port 44332 unreachable, length 72
10:26:23.367220 ARP, Request who-has 10.200.0.30 tell 10.200.0.29, length 28
10:26:23.367232 ARP, Reply 10.200.0.30 is-at 0a:58:0a:c8:00:1e (oui Unknown), length 28
10:26:23.370352 ARP, Request who-has 10.200.0.1 tell 10.200.0.29, length 28
10:26:23.370363 ARP, Reply 10.200.0.1 is-at 0a:58:0a:c8:00:01 (oui Unknown), length 28
10:26:28.367698 IP 10.200.0.29.48446 > 10.32.0.10.domain: 9+ AAAA? kubernetes.default. (36)
10:26:28.369133 IP 10.200.0.30.domain > 10.200.0.29.48446: 9 NXDomain* 0/0/0 (36)
10:26:28.369149 IP 10.200.0.29 > 10.200.0.30: ICMP 10.200.0.29 udp port 48446 unreachable, length 72
10:26:33.381266 IP 10.200.0.29.50714 > 10.32.0.10.domain: 10+ A? kubernetes.default. (36)
10:26:33.382745 IP 10.200.0.30.domain > 10.200.0.29.50714: 10 NXDomain* 0/0/0 (36)
10:26:33.382762 IP 10.200.0.29 > 10.200.0.30: ICMP 10.200.0.29 udp port 50714 unreachable, length 72
10:26:38.386288 IP 10.200.0.29.39198 > 10.32.0.10.domain: 11+ A? kubernetes.default. (36)
10:26:38.388635 IP 10.200.0.30.domain > 10.200.0.29.39198: 11 NXDomain* 0/0/0 (36)
10:26:38.388658 IP 10.200.0.29 > 10.200.0.30: ICMP 10.200.0.29 udp port 39198 unreachable, length 72
10:26:38.395241 ARP, Request who-has 10.200.0.29 tell 10.200.0.30, length 28
10:26:38.395248 ARP, Reply 10.200.0.29 is-at 0a:58:0a:c8:00:1d (oui Unknown), length 28
10:26:43.389355 IP 10.200.0.29.46495 > 10.32.0.10.domain: 12+ A? kubernetes.default. (36)
10:26:43.391522 IP 10.200.0.30.domain > 10.200.0.29.46495: 12 NXDomain* 0/0/0 (36)
10:26:43.391539 IP 10.200.0.2

Cluster Infrastructure

NAMESPACE     NAME             DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
default       deploy/busybox   1         1         1            1           1h
kube-system   deploy/coredns   1         1         1            1           17h

NAMESPACE     NAME                    DESIRED   CURRENT   READY     AGE
default       rs/busybox-56db8bd9d7   1         1         1         1h
kube-system   rs/coredns-b8d4b46c8    1         1         1         17h

NAMESPACE     NAME             DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
default       deploy/busybox   1         1         1            1           1h
kube-system   deploy/coredns   1         1         1            1           17h

NAMESPACE     NAME                    DESIRED   CURRENT   READY     AGE
default       rs/busybox-56db8bd9d7   1         1         1         1h
kube-system   rs/coredns-b8d4b46c8    1         1         1         17h

NAMESPACE     NAME                          READY     STATUS    RESTARTS   AGE
default       po/busybox-56db8bd9d7-fv7np   1/1       Running   2          1h
kube-system   po/coredns-b8d4b46c8-6tg5d    1/1       Running   2          17h

NAMESPACE     NAME             TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
default       svc/kubernetes   ClusterIP   10.32.0.1    <none>        443/TCP                  22h
kube-system   svc/kube-dns     ClusterIP   10.32.0.10   <none>        53/UDP,53/TCP,9153/TCP   17h

Busybox IP

kubectl describe pod busybox-56db8bd9d7-fv7np | grep IP
IP:             10.200.0.29

EndPoints IP to see DNS IP and port

kubectl get endpoints --all-namespaces
NAMESPACE     NAME                      ENDPOINTS                                        AGE
default       kubernetes                192.168.0.218:6443                               22h
kube-system   kube-controller-manager   <none>                                           22h
kube-system   kube-dns                  10.200.0.30:9153,10.200.0.30:53,10.200.0.30:53   2h
kube-system   kube-scheduler            <none>                                           22h
1
hi, did you create a service for a busybox pod?Suresh Vishnoi
@SureshVishnoi No. But shouldn't the call back work without a service in front of busybox? I only had to spawn a busybox pod to do DNS lookups on a different cluster setup with kubeadm setup and it did work there.elhombre
Service Resource will help pod to expose it not only in the cluster but also outside. Can you try creating a service file for busybox pod? and test it if it works i will try to explain to you what is happeningSuresh Vishnoi
@SureshVishnoi busybox pod needs no service to be able to connect to other services in cluster including dnsRadek 'Goblin' Pieczonka
@elhombre is your kube-proxy running correctly on the node ?Radek 'Goblin' Pieczonka

1 Answers

1
votes

Debugging this requires a couple of steps to make sure you have all the ground coveres.

Start with launching a pod (can be busybox or whatever) that will have some tool like host, dig or nslookup.

Next, identify the POD IP of the coredns. With that, poceed to say host kubernetes.default.svc.cluster.local <podIP>. If that does not work, there is something wrong with pod-to-pod connectivity in your cluster.

If it does, try host kubernetes.default.svc.cluster.local <service IP> with the service IP of your dns service. If it does not work, then it looks like kube-proxy is not doing it's work properly or something is messed up on iptables level.

If it worked, take a look at /etc/resolv.conf in pod and kubelet --cluster-dns flag value.

sidenote: all of the above is assuming your coredns it self works fine in the first place