Kubernetes DNS skydns Liveness/Readiness probe failed

Question

Whenever DNS gets ran on a kubelet other than the one that resides on the master node then the Liveness and Readiness probes for skydns keep failing. I am deploying the add ons as a service similar to what is used in the salt cluster. I have configured my system to use tokens and have verified that a token gets generated for system:dns and gets configured correctly for the kubelet. Is there something additional I need to do inside the skydns rc/svc yamls as well because of this?

Salt Cluster: https://github.com/kubernetes/kubernetes/tree/master/cluster/saltbase/salt/kube-addons

Ansible Deployment: https://github.com/kubernetes/contrib/tree/master/ansible/roles/kubernetes-addons/files

I am using the standard skydns rc/svc yamls.

Pod Description:

Name:               kube-dns-v10-pgqig
Namespace:          kube-system
Image(s):           gcr.io/google_containers/etcd:2.0.9,gcr.io/google_containers/kube2sky:1.12,gcr.io/google_containers/skydns:2015-10-13-8c72f8c,gcr.io/google_containers/exechealthz:1.0
Node:               minion-1/172.28.129.2
Start Time:         Thu, 21 Jan 2016 08:54:50 -0800
Labels:             k8s-app=kube-dns,kubernetes.io/cluster-service=true,version=v10
Status:             Running
Reason:             
Message:            
IP:             18.16.18.9
Replication Controllers:    kube-dns-v10 (1/1 replicas created)
Containers:
  etcd:
    Container ID:   docker://49216f478c99fcd3c25763e99bb18861d31025a0cadd538f9590295e78846f69
    Image:      gcr.io/google_containers/etcd:2.0.9
    Image ID:       docker://b6b9a86dc06aa1361357ca1b105feba961f6a4145adca6c54e142c0be0fe87b0
    Command:
      /usr/local/bin/etcd
      -data-dir
      /var/etcd/data
      -listen-client-urls
      http://127.0.0.1:2379,http://127.0.0.1:4001
      -advertise-client-urls
      http://127.0.0.1:2379,http://127.0.0.1:4001
      -initial-cluster-token
      skydns-etcd
    QoS Tier:
      cpu:  Guaranteed
      memory:   Guaranteed
    Limits:
      cpu:  100m
      memory:   50Mi
    Requests:
      cpu:      100m
      memory:       50Mi
    State:      Running
      Started:      Thu, 21 Jan 2016 08:54:51 -0800
    Ready:      True
    Restart Count:  0
    Environment Variables:
  kube2sky:
    Container ID:   docker://4cbdf45e1ba0a6a820120c934473e61bf74af49d1ff42a0da01abd593516f4ee
    Image:      gcr.io/google_containers/kube2sky:1.12
    Image ID:       docker://b8f3273706d3fc51375779110828379bdbb663e556cca3925e87fbc614725bb1
    Args:
      -domain=cluster.local
      -kube_master_url=http://master:8080
    QoS Tier:
      memory:   Guaranteed
      cpu:  Guaranteed
    Limits:
      memory:   50Mi
      cpu:  100m
    Requests:
      memory:       50Mi
      cpu:      100m
    State:      Running
      Started:      Thu, 21 Jan 2016 08:54:51 -0800
    Ready:      True
    Restart Count:  0
    Environment Variables:
  skydns:
    Container ID:   docker://bd3103f514dcc4e42ff2c126446d963d03ef1101833239926c84d5c0ba577929
    Image:      gcr.io/google_containers/skydns:2015-10-13-8c72f8c
    Image ID:       docker://763c92e53f311c40a922628a34daf0be4397463589a7d148cea8291f02c12a5d
    Args:
      -machines=http://127.0.0.1:4001
      -addr=0.0.0.0:53
      -ns-rotate=false
      -domain=cluster.local.
    QoS Tier:
      memory:   Guaranteed
      cpu:  Guaranteed
    Limits:
      cpu:  100m
      memory:   50Mi
    Requests:
      cpu:          100m
      memory:           50Mi
    State:          Running
      Started:          Thu, 21 Jan 2016 09:13:50 -0800
    Last Termination State: Terminated
      Reason:           Error
      Exit Code:        2
      Started:          Thu, 21 Jan 2016 09:13:14 -0800
      Finished:         Thu, 21 Jan 2016 09:13:50 -0800
    Ready:          False
    Restart Count:      28
    Environment Variables:
  healthz:
    Container ID:   docker://b46d2bb06a72cda25565b4f40ce956f252dce5df7f590217b3307126ec29e7c7
    Image:      gcr.io/google_containers/exechealthz:1.0
    Image ID:       docker://4f3d04b1d47b64834d494f9416d1f17a5f93a3e2035ad604fee47cfbba62be60
    Args:
      -cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
      -port=8080
    QoS Tier:
      memory:   Guaranteed
      cpu:  Guaranteed
    Limits:
      cpu:  10m
      memory:   20Mi
    Requests:
      cpu:      10m
      memory:       20Mi
    State:      Running
      Started:      Thu, 21 Jan 2016 08:54:51 -0800
    Ready:      True
    Restart Count:  0
    Environment Variables:
Conditions:
  Type      Status
  Ready     False 
Volumes:
  etcd-storage:
    Type:   EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium: 
  default-token-62irv:
    Type:   Secret (a secret that should populate this volume)
    SecretName: default-token-62irv
Events:
  FirstSeen LastSeen    Count   From            SubobjectPath           Type        Reason      Message
  --------- --------    -----   ----            -------------           --------    ------      -------
  19m       19m     1   {kubelet minion-1}  spec.containers{etcd}       Normal      Created     Created container with docker id 49216f478c99
  19m       19m     1   {scheduler }                        Normal      Scheduled   Successfully assigned kube-dns-v10-pgqig to minion-1
  19m       19m     1   {kubelet minion-1}  spec.containers{etcd}       Normal      Pulled      Container image "gcr.io/google_containers/etcd:2.0.9" already present on machine
  19m       19m     1   {kubelet minion-1}  spec.containers{kube2sky}   Normal      Created     Created container with docker id 4cbdf45e1ba0
  19m       19m     1   {kubelet minion-1}  spec.containers{kube2sky}   Normal      Started     Started container with docker id 4cbdf45e1ba0
  19m       19m     1   {kubelet minion-1}  spec.containers{skydns}     Normal      Created     Created container with docker id fdb1278aaf93
  19m       19m     1   {kubelet minion-1}  spec.containers{skydns}     Normal      Started     Started container with docker id fdb1278aaf93
  19m       19m     1   {kubelet minion-1}  spec.containers{healthz}    Normal      Pulled      Container image "gcr.io/google_containers/exechealthz:1.0" already present on machine
  19m       19m     1   {kubelet minion-1}  spec.containers{healthz}    Normal      Created     Created container with docker id b46d2bb06a72
  19m       19m     1   {kubelet minion-1}  spec.containers{healthz}    Normal      Started     Started container with docker id b46d2bb06a72
  19m       19m     1   {kubelet minion-1}  spec.containers{etcd}       Normal      Started     Started container with docker id 49216f478c99
  19m       19m     1   {kubelet minion-1}  spec.containers{kube2sky}   Normal      Pulled      Container image "gcr.io/google_containers/kube2sky:1.12" already present on machine
  18m       18m     1   {kubelet minion-1}  spec.containers{skydns}     Normal      Killing     Killing container with docker id fdb1278aaf93: pod "kube-dns-v10-pgqig_kube-system(af674b6a-c05f-11e5-9e37-08002771c788)" container "skydns" is unhealthy, it will be killed and re-created.
  18m       18m     1   {kubelet minion-1}  spec.containers{skydns}     Normal      Started     Started container with docker id 70474f1ca315
  18m       18m     1   {kubelet minion-1}  spec.containers{skydns}     Normal      Created     Created container with docker id 70474f1ca315
  17m       17m     1   {kubelet minion-1}  spec.containers{skydns}     Normal      Killing     Killing container with docker id 70474f1ca315: pod "kube-dns-v10-pgqig_kube-system(af674b6a-c05f-11e5-9e37-08002771c788)" container "skydns" is unhealthy, it will be killed and re-created.
  17m       17m     1   {kubelet minion-1}  spec.containers{skydns}     Normal      Created     Created container with docker id 8e18a0b404dd
  17m       17m     1   {kubelet minion-1}  spec.containers{skydns}     Normal      Started     Started container with docker id 8e18a0b404dd
  16m       16m     1   {kubelet minion-1}  spec.containers{skydns}     Normal      Created     Created container with docker id 00b4e2a46779
  16m       16m     1   {kubelet minion-1}  spec.containers{skydns}     Normal      Killing     Killing container with docker id 8e18a0b404dd: pod "kube-dns-v10-pgqig_kube-system(af674b6a-c05f-11e5-9e37-08002771c788)" container "skydns" is unhealthy, it will be killed and re-created.
  16m       16m     1   {kubelet minion-1}  spec.containers{skydns}     Normal      Started     Started container with docker id 00b4e2a46779
  16m       16m     1   {kubelet minion-1}  spec.containers{skydns}     Normal      Started     Started container with docker id 3df9a304e09a
  16m       16m     1   {kubelet minion-1}  spec.containers{skydns}     Normal      Killing     Killing container with docker id 00b4e2a46779: pod "kube-dns-v10-pgqig_kube-system(af674b6a-c05f-11e5-9e37-08002771c788)" container "skydns" is unhealthy, it will be killed and re-created.
  16m       16m     1   {kubelet minion-1}  spec.containers{skydns}     Normal      Created     Created container with docker id 3df9a304e09a
  15m       15m     1   {kubelet minion-1}  spec.containers{skydns}     Normal      Killing     Killing container with docker id 3df9a304e09a: pod "kube-dns-v10-pgqig_kube-system(af674b6a-c05f-11e5-9e37-08002771c788)" container "skydns" is unhealthy, it will be killed and re-created.
  15m       15m     1   {kubelet minion-1}  spec.containers{skydns}     Normal      Created     Created container with docker id 4b3ee7fccfd2
  15m       15m     1   {kubelet minion-1}  spec.containers{skydns}     Normal      Started     Started container with docker id 4b3ee7fccfd2
  14m       14m     1   {kubelet minion-1}  spec.containers{skydns}     Normal      Killing     Killing container with docker id 4b3ee7fccfd2: pod "kube-dns-v10-pgqig_kube-system(af674b6a-c05f-11e5-9e37-08002771c788)" container "skydns" is unhealthy, it will be killed and re-created.
  14m       14m     1   {kubelet minion-1}  spec.containers{skydns}     Normal      Killing     Killing container with docker id d1100cb0a5be: pod "kube-dns-v10-pgqig_kube-system(af674b6a-c05f-11e5-9e37-08002771c788)" container "skydns" is unhealthy, it will be killed and re-created.
  13m       13m     1   {kubelet minion-1}  spec.containers{skydns}     Normal      Killing     Killing container with docker id 19e2bbda4f80: pod "kube-dns-v10-pgqig_kube-system(af674b6a-c05f-11e5-9e37-08002771c788)" container "skydns" is unhealthy, it will be killed and re-created.
  12m       12m     1   {kubelet minion-1}  spec.containers{skydns}     Normal      Killing     Killing container with docker id c424c0ad713a: pod "kube-dns-v10-pgqig_kube-system(af674b6a-c05f-11e5-9e37-08002771c788)" container "skydns" is unhealthy, it will be killed and re-created.
  19m       1s      29  {kubelet minion-1}  spec.containers{skydns}     Normal      Pulled      Container image "gcr.io/google_containers/skydns:2015-10-13-8c72f8c" already present on machine
  12m       1s      19  {kubelet minion-1}  spec.containers{skydns}     Normal      Killing     (events with common reason combined)
  14m       1s      23  {kubelet minion-1}  spec.containers{skydns}     Normal      Created     (events with common reason combined)
  14m       1s      23  {kubelet minion-1}  spec.containers{skydns}     Normal      Started     (events with common reason combined)
  18m       1s      30  {kubelet minion-1}  spec.containers{skydns}     Warning     Unhealthy   Liveness probe failed: HTTP probe failed with statuscode: 503
  18m       1s      114 {kubelet minion-1}  spec.containers{skydns}     Warning     Unhealthy   Readiness probe failed: HTTP probe failed with statuscode: 503

(etcd)

$ kubectl logs kube-dns-v10-0biid skydns --namespace=kube-system
2016/01/22 00:23:03 skydns: falling back to default configuration, could not read from etcd: 100: Key not found (/skydns) [2]
2016/01/22 00:23:03 skydns: ready for queries on cluster.local. for tcp://0.0.0.0:53 [rcache 0]
2016/01/22 00:23:03 skydns: ready for queries on cluster.local. for udp://0.0.0.0:53 [rcache 0]
2016/01/22 00:23:09 skydns: failure to forward request "read udp 10.0.2.3:53: i/o timeout"
2016/01/22 00:23:13 skydns: failure to forward request "read udp 10.0.2.3:53: i/o timeout"
2016/01/22 00:23:17 skydns: failure to forward request "read udp 10.0.2.3:53: i/o timeout"
2016/01/22 00:23:21 skydns: failure to forward request "read udp 10.0.2.3:53: i/o timeout"
2016/01/22 00:23:25 skydns: failure to forward request "read udp 10.0.2.3:53: i/o timeout"
2016/01/22 00:23:29 skydns: failure to forward request "read udp 10.0.2.3:53: i/o timeout"
2016/01/22 00:23:33 skydns: failure to forward request "read udp 10.0.2.3:53: i/o timeout"
2016/01/22 00:23:37 skydns: failure to forward request "read udp 10.0.2.3:53: i/o timeout"
2016/01/22 00:23:41 skydns: failure to forward request "read udp 10.0.2.3:53: i/o timeout"
[vagrant@kubernetes-master ~]$ kubectl logs kube-dns-v10-0biid etcd --namespace=kube-system
2016/01/21 23:28:10 etcd: listening for peers on http://localhost:2380
2016/01/21 23:28:10 etcd: listening for peers on http://localhost:7001
2016/01/21 23:28:10 etcd: listening for client requests on http://127.0.0.1:2379
2016/01/21 23:28:10 etcd: listening for client requests on http://127.0.0.1:4001
2016/01/21 23:28:10 etcdserver: datadir is valid for the 2.0.1 format
2016/01/21 23:28:10 etcdserver: name = default
2016/01/21 23:28:10 etcdserver: data dir = /var/etcd/data
2016/01/21 23:28:10 etcdserver: member dir = /var/etcd/data/member
2016/01/21 23:28:10 etcdserver: heartbeat = 100ms
2016/01/21 23:28:10 etcdserver: election = 1000ms
2016/01/21 23:28:10 etcdserver: snapshot count = 10000
2016/01/21 23:28:10 etcdserver: advertise client URLs = http://127.0.0.1:2379,http://127.0.0.1:4001
2016/01/21 23:28:10 etcdserver: initial advertise peer URLs = http://localhost:2380,http://localhost:7001
2016/01/21 23:28:10 etcdserver: initial cluster = default=http://localhost:2380,default=http://localhost:7001
2016/01/21 23:28:10 etcdserver: start member 6a5871dbdd12c17c in cluster f68652439e3f8f2a
2016/01/21 23:28:10 raft: 6a5871dbdd12c17c became follower at term 0
2016/01/21 23:28:10 raft: newRaft 6a5871dbdd12c17c [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
2016/01/21 23:28:10 raft: 6a5871dbdd12c17c became follower at term 1
2016/01/21 23:28:10 etcdserver: added local member 6a5871dbdd12c17c [http://localhost:2380 http://localhost:7001] to cluster f68652439e3f8f2a
2016/01/21 23:28:12 raft: 6a5871dbdd12c17c is starting a new election at term 1
2016/01/21 23:28:12 raft: 6a5871dbdd12c17c became candidate at term 2
2016/01/21 23:28:12 raft: 6a5871dbdd12c17c received vote from 6a5871dbdd12c17c at term 2
2016/01/21 23:28:12 raft: 6a5871dbdd12c17c became leader at term 2
2016/01/21 23:28:12 raft.node: 6a5871dbdd12c17c elected leader 6a5871dbdd12c17c at term 2
2016/01/21 23:28:12 etcdserver: published {Name:default ClientURLs:[http://127.0.0.1:2379 http://127.0.0.1:4001]} to cluster f68652439e3f8f2a

(kube2sky)

I0121 23:28:19.352170       1 kube2sky.go:436] Etcd server found: http://127.0.0.1:4001
I0121 23:28:20.354200       1 kube2sky.go:503] Using https://10.254.0.1:443 for kubernetes master
I0121 23:28:20.354248       1 kube2sky.go:504] Using kubernetes API <nil>

(skydns)

 kubectl logs kube-dns-v10-0biid skydns --namespace=kube-system
    2016/01/22 00:27:43 skydns: falling back to default configuration, could not read from etcd: 100: Key not found (/skydns) [2]
    2016/01/22 00:27:43 skydns: ready for queries on cluster.local. for tcp://0.0.0.0:53 [rcache 0]
    2016/01/22 00:27:43 skydns: ready for queries on cluster.local. for udp://0.0.0.0:53 [rcache 0]
    2016/01/22 00:27:49 skydns: failure to forward request "read udp 10.0.2.3:53: i/o timeout"
    2016/01/22 00:27:53 skydns: failure to forward request "read udp 10.0.2.3:53: i/o timeout"
    2016/01/22 00:27:57 skydns: failure to forward request "read udp 10.0.2.3:53: i/o timeout"
    2016/01/22 00:28:01 skydns: failure to forward request "read udp 10.0.2.3:53: i/o timeout"
    2016/01/22 00:28:05 skydns: failure to forward request "read udp 10.0.2.3:53: i/o timeout"
    2016/01/22 00:28:09 skydns: failure to forward request "read udp 10.0.2.3:53: i/o timeout"
    2016/01/22 00:28:13 skydns: failure to forward request "read udp 10.0.2.3:53: i/o timeout"
    2016/01/22 00:28:17 skydns: failure to forward request "read udp 10.0.2.3:53: i/o timeout"

The service endpoint IP does NOT seem to be getting set:

kubectl describe svc kube-dns --namespace=kube-system
Name:           kube-dns
Namespace:      kube-system
Labels:         k8s-app=kube-dns,kubernetes.io/cluster-service=true,kubernetes.io/name=KubeDNS
Selector:       k8s-app=kube-dns
Type:           ClusterIP
IP:         10.254.0.10
Port:           dns 53/UDP
Endpoints:      
Port:           dns-tcp 53/TCP
Endpoints:      
Session Affinity:   None
No events.

I have double checked the serviceaccounts and that all seems configured correctly:

 kubectl get secrets --all-namespaces
NAMESPACE     NAME                                     TYPE                                  DATA      AGE
default       default-token-z71xj                      kubernetes.io/service-account-token   2         1h
kube-system   default-token-wce74                      kubernetes.io/service-account-token   2         1h
kube-system   token-system-controller-manager-master   Opaque                                1         1h
kube-system   token-system-dns                         Opaque                                1         1h
kube-system   token-system-kubectl-master              Opaque                                1         1h
kube-system   token-system-kubelet-minion-1            Opaque                                1         1h
kube-system   token-system-logging                     Opaque                                1         1h
kube-system   token-system-monitoring                  Opaque                                1         1h
kube-system   token-system-proxy-minion-1              Opaque                                1         1h
kube-system   token-system-scheduler-master            Opaque                                1         1h

The default secret for kube-system namespaces which matches the one the POD is using.

kubectl describe secrets default-token-wce74 --namespace=kube-system 
Name:       default-token-wce74
Namespace:  kube-system
Labels:     <none>
Annotations:    kubernetes.io/service-account.name=default,kubernetes.io/service-account.uid=70da0a10-c096-11e5-aa7b-08002771c788

Type:   kubernetes.io/service-account-token

Data
====
token:  eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkZWZhdWx0LXRva2VuLXdjZTc0Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImRlZmF1bHQiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiI3MGRhMGExMC1jMDk2LTExZTUtYWE3Yi0wODAwMjc3MWM3ODgiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06ZGVmYXVsdCJ9.sykf8qmh9ekAEHnSPAMLPz04zebvDJhb72A2YC1Y8_BXoA57U7KRAVDVyyxQHrEUSlHsSfxzqHHOcLniPQbqWZxc0bK4taV6zdBKIgndEthz0HGJQJdfZJKxurP5dhI6TOIpeLYpUE6BN6ubsVQiJksVLK_Lfq_c1posqAUi8eXD-KsqRDA98JMUZyirRGRXzZfF7-KscIqys7AiHAURHHwDibjmXIdYKBpDwc6hOIATpS3r6rLj30R1hNYy4u2GkpNsIYo83zIt515rnfCH9Yq1syT6-qho0SaPnj3us-uT8ZXF0x_7SlChV9Wx5Mo6kW3EHg6-A6q6m3R0KlsHjQ
ca.crt: 1387 bytes

I have also kubectl exec into the kube2sky container and the ca.crt matches the one on the server.

Your kube-dns pod needs to see and talk to the master. This happens through a Kubernetes service. Try curling the service ip over https, if you see someting like "un-authorized", that means you can actually talk to it, if not it means you're not even able to reach the master. — Prashanth B
I had to remove the probe checks from the skydns-rc.yaml, otherwise an endpoint IP address was never assigned. Once I did this skydns started running. I did a kubectl describe svc kube-dns --namespace=kube-system and it says the endpoint was 18.16.88.6, so I curled via: curl https://18.16.88.6 and get a connection refused. I'm guessing this means some authorization isn't setup correctly? I also attempted to validate DNS but it gives me a "nslookup: can't resolve 'kubernetes.default'". — tbs
Not sure if that applies to you, but I've seen this when I went from add-on v9 to v10. I ended up killing all dns-v9, killing the DNS RC, then restart with v10 only. kubectl somehow only showed the v10 pods but there was still a v9 pod running somewhere. — MrE

tbs tbs · Accepted Answer · 2016-01-25T19:42:02

It seems that there were two problems I had:

Cert Creation

My implementation is based off the ansible deployment found here: https://github.com/kubernetes/contrib/tree/master/ansible

This deployment seems to generates the certs for all networking interfaces. It also adds IP: in front of them and then in the script that generates the certs(make-ca-cert.sh) it prepends IP again. Not 100% sure if that is okay. However, I changed it to just generate certs for the networking interface and removed the addition IP: and that seems to of resolved the issue.

Very good thread explaining certs, how to create them and how they work with Kubernetes: https://github.com/kubernetes/kubernetes/issues/11000

APIServer Setting --advertise-address

Also, apparently I needed to set --advertise-address as well for the apiserver.

Adjusting these two things seemed to of resolved the issue.

Kubernetes DNS skydns Liveness/Readiness probe failed

1 Answers

Cert Creation

APIServer Setting --advertise-address