Whenever DNS gets ran on a kubelet other than the one that resides on the master node then the Liveness and Readiness probes for skydns keep failing. I am deploying the add ons as a service similar to what is used in the salt cluster. I have configured my system to use tokens and have verified that a token gets generated for system:dns and gets configured correctly for the kubelet. Is there something additional I need to do inside the skydns rc/svc yamls as well because of this?
Salt Cluster: https://github.com/kubernetes/kubernetes/tree/master/cluster/saltbase/salt/kube-addons
Ansible Deployment: https://github.com/kubernetes/contrib/tree/master/ansible/roles/kubernetes-addons/files
I am using the standard skydns rc/svc yamls.
Pod Description:
Name: kube-dns-v10-pgqig
Namespace: kube-system
Image(s): gcr.io/google_containers/etcd:2.0.9,gcr.io/google_containers/kube2sky:1.12,gcr.io/google_containers/skydns:2015-10-13-8c72f8c,gcr.io/google_containers/exechealthz:1.0
Node: minion-1/172.28.129.2
Start Time: Thu, 21 Jan 2016 08:54:50 -0800
Labels: k8s-app=kube-dns,kubernetes.io/cluster-service=true,version=v10
Status: Running
Reason:
Message:
IP: 18.16.18.9
Replication Controllers: kube-dns-v10 (1/1 replicas created)
Containers:
etcd:
Container ID: docker://49216f478c99fcd3c25763e99bb18861d31025a0cadd538f9590295e78846f69
Image: gcr.io/google_containers/etcd:2.0.9
Image ID: docker://b6b9a86dc06aa1361357ca1b105feba961f6a4145adca6c54e142c0be0fe87b0
Command:
/usr/local/bin/etcd
-data-dir
/var/etcd/data
-listen-client-urls
http://127.0.0.1:2379,http://127.0.0.1:4001
-advertise-client-urls
http://127.0.0.1:2379,http://127.0.0.1:4001
-initial-cluster-token
skydns-etcd
QoS Tier:
cpu: Guaranteed
memory: Guaranteed
Limits:
cpu: 100m
memory: 50Mi
Requests:
cpu: 100m
memory: 50Mi
State: Running
Started: Thu, 21 Jan 2016 08:54:51 -0800
Ready: True
Restart Count: 0
Environment Variables:
kube2sky:
Container ID: docker://4cbdf45e1ba0a6a820120c934473e61bf74af49d1ff42a0da01abd593516f4ee
Image: gcr.io/google_containers/kube2sky:1.12
Image ID: docker://b8f3273706d3fc51375779110828379bdbb663e556cca3925e87fbc614725bb1
Args:
-domain=cluster.local
-kube_master_url=http://master:8080
QoS Tier:
memory: Guaranteed
cpu: Guaranteed
Limits:
memory: 50Mi
cpu: 100m
Requests:
memory: 50Mi
cpu: 100m
State: Running
Started: Thu, 21 Jan 2016 08:54:51 -0800
Ready: True
Restart Count: 0
Environment Variables:
skydns:
Container ID: docker://bd3103f514dcc4e42ff2c126446d963d03ef1101833239926c84d5c0ba577929
Image: gcr.io/google_containers/skydns:2015-10-13-8c72f8c
Image ID: docker://763c92e53f311c40a922628a34daf0be4397463589a7d148cea8291f02c12a5d
Args:
-machines=http://127.0.0.1:4001
-addr=0.0.0.0:53
-ns-rotate=false
-domain=cluster.local.
QoS Tier:
memory: Guaranteed
cpu: Guaranteed
Limits:
cpu: 100m
memory: 50Mi
Requests:
cpu: 100m
memory: 50Mi
State: Running
Started: Thu, 21 Jan 2016 09:13:50 -0800
Last Termination State: Terminated
Reason: Error
Exit Code: 2
Started: Thu, 21 Jan 2016 09:13:14 -0800
Finished: Thu, 21 Jan 2016 09:13:50 -0800
Ready: False
Restart Count: 28
Environment Variables:
healthz:
Container ID: docker://b46d2bb06a72cda25565b4f40ce956f252dce5df7f590217b3307126ec29e7c7
Image: gcr.io/google_containers/exechealthz:1.0
Image ID: docker://4f3d04b1d47b64834d494f9416d1f17a5f93a3e2035ad604fee47cfbba62be60
Args:
-cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
-port=8080
QoS Tier:
memory: Guaranteed
cpu: Guaranteed
Limits:
cpu: 10m
memory: 20Mi
Requests:
cpu: 10m
memory: 20Mi
State: Running
Started: Thu, 21 Jan 2016 08:54:51 -0800
Ready: True
Restart Count: 0
Environment Variables:
Conditions:
Type Status
Ready False
Volumes:
etcd-storage:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
default-token-62irv:
Type: Secret (a secret that should populate this volume)
SecretName: default-token-62irv
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
19m 19m 1 {kubelet minion-1} spec.containers{etcd} Normal Created Created container with docker id 49216f478c99
19m 19m 1 {scheduler } Normal Scheduled Successfully assigned kube-dns-v10-pgqig to minion-1
19m 19m 1 {kubelet minion-1} spec.containers{etcd} Normal Pulled Container image "gcr.io/google_containers/etcd:2.0.9" already present on machine
19m 19m 1 {kubelet minion-1} spec.containers{kube2sky} Normal Created Created container with docker id 4cbdf45e1ba0
19m 19m 1 {kubelet minion-1} spec.containers{kube2sky} Normal Started Started container with docker id 4cbdf45e1ba0
19m 19m 1 {kubelet minion-1} spec.containers{skydns} Normal Created Created container with docker id fdb1278aaf93
19m 19m 1 {kubelet minion-1} spec.containers{skydns} Normal Started Started container with docker id fdb1278aaf93
19m 19m 1 {kubelet minion-1} spec.containers{healthz} Normal Pulled Container image "gcr.io/google_containers/exechealthz:1.0" already present on machine
19m 19m 1 {kubelet minion-1} spec.containers{healthz} Normal Created Created container with docker id b46d2bb06a72
19m 19m 1 {kubelet minion-1} spec.containers{healthz} Normal Started Started container with docker id b46d2bb06a72
19m 19m 1 {kubelet minion-1} spec.containers{etcd} Normal Started Started container with docker id 49216f478c99
19m 19m 1 {kubelet minion-1} spec.containers{kube2sky} Normal Pulled Container image "gcr.io/google_containers/kube2sky:1.12" already present on machine
18m 18m 1 {kubelet minion-1} spec.containers{skydns} Normal Killing Killing container with docker id fdb1278aaf93: pod "kube-dns-v10-pgqig_kube-system(af674b6a-c05f-11e5-9e37-08002771c788)" container "skydns" is unhealthy, it will be killed and re-created.
18m 18m 1 {kubelet minion-1} spec.containers{skydns} Normal Started Started container with docker id 70474f1ca315
18m 18m 1 {kubelet minion-1} spec.containers{skydns} Normal Created Created container with docker id 70474f1ca315
17m 17m 1 {kubelet minion-1} spec.containers{skydns} Normal Killing Killing container with docker id 70474f1ca315: pod "kube-dns-v10-pgqig_kube-system(af674b6a-c05f-11e5-9e37-08002771c788)" container "skydns" is unhealthy, it will be killed and re-created.
17m 17m 1 {kubelet minion-1} spec.containers{skydns} Normal Created Created container with docker id 8e18a0b404dd
17m 17m 1 {kubelet minion-1} spec.containers{skydns} Normal Started Started container with docker id 8e18a0b404dd
16m 16m 1 {kubelet minion-1} spec.containers{skydns} Normal Created Created container with docker id 00b4e2a46779
16m 16m 1 {kubelet minion-1} spec.containers{skydns} Normal Killing Killing container with docker id 8e18a0b404dd: pod "kube-dns-v10-pgqig_kube-system(af674b6a-c05f-11e5-9e37-08002771c788)" container "skydns" is unhealthy, it will be killed and re-created.
16m 16m 1 {kubelet minion-1} spec.containers{skydns} Normal Started Started container with docker id 00b4e2a46779
16m 16m 1 {kubelet minion-1} spec.containers{skydns} Normal Started Started container with docker id 3df9a304e09a
16m 16m 1 {kubelet minion-1} spec.containers{skydns} Normal Killing Killing container with docker id 00b4e2a46779: pod "kube-dns-v10-pgqig_kube-system(af674b6a-c05f-11e5-9e37-08002771c788)" container "skydns" is unhealthy, it will be killed and re-created.
16m 16m 1 {kubelet minion-1} spec.containers{skydns} Normal Created Created container with docker id 3df9a304e09a
15m 15m 1 {kubelet minion-1} spec.containers{skydns} Normal Killing Killing container with docker id 3df9a304e09a: pod "kube-dns-v10-pgqig_kube-system(af674b6a-c05f-11e5-9e37-08002771c788)" container "skydns" is unhealthy, it will be killed and re-created.
15m 15m 1 {kubelet minion-1} spec.containers{skydns} Normal Created Created container with docker id 4b3ee7fccfd2
15m 15m 1 {kubelet minion-1} spec.containers{skydns} Normal Started Started container with docker id 4b3ee7fccfd2
14m 14m 1 {kubelet minion-1} spec.containers{skydns} Normal Killing Killing container with docker id 4b3ee7fccfd2: pod "kube-dns-v10-pgqig_kube-system(af674b6a-c05f-11e5-9e37-08002771c788)" container "skydns" is unhealthy, it will be killed and re-created.
14m 14m 1 {kubelet minion-1} spec.containers{skydns} Normal Killing Killing container with docker id d1100cb0a5be: pod "kube-dns-v10-pgqig_kube-system(af674b6a-c05f-11e5-9e37-08002771c788)" container "skydns" is unhealthy, it will be killed and re-created.
13m 13m 1 {kubelet minion-1} spec.containers{skydns} Normal Killing Killing container with docker id 19e2bbda4f80: pod "kube-dns-v10-pgqig_kube-system(af674b6a-c05f-11e5-9e37-08002771c788)" container "skydns" is unhealthy, it will be killed and re-created.
12m 12m 1 {kubelet minion-1} spec.containers{skydns} Normal Killing Killing container with docker id c424c0ad713a: pod "kube-dns-v10-pgqig_kube-system(af674b6a-c05f-11e5-9e37-08002771c788)" container "skydns" is unhealthy, it will be killed and re-created.
19m 1s 29 {kubelet minion-1} spec.containers{skydns} Normal Pulled Container image "gcr.io/google_containers/skydns:2015-10-13-8c72f8c" already present on machine
12m 1s 19 {kubelet minion-1} spec.containers{skydns} Normal Killing (events with common reason combined)
14m 1s 23 {kubelet minion-1} spec.containers{skydns} Normal Created (events with common reason combined)
14m 1s 23 {kubelet minion-1} spec.containers{skydns} Normal Started (events with common reason combined)
18m 1s 30 {kubelet minion-1} spec.containers{skydns} Warning Unhealthy Liveness probe failed: HTTP probe failed with statuscode: 503
18m 1s 114 {kubelet minion-1} spec.containers{skydns} Warning Unhealthy Readiness probe failed: HTTP probe failed with statuscode: 503
(etcd)
$ kubectl logs kube-dns-v10-0biid skydns --namespace=kube-system
2016/01/22 00:23:03 skydns: falling back to default configuration, could not read from etcd: 100: Key not found (/skydns) [2]
2016/01/22 00:23:03 skydns: ready for queries on cluster.local. for tcp://0.0.0.0:53 [rcache 0]
2016/01/22 00:23:03 skydns: ready for queries on cluster.local. for udp://0.0.0.0:53 [rcache 0]
2016/01/22 00:23:09 skydns: failure to forward request "read udp 10.0.2.3:53: i/o timeout"
2016/01/22 00:23:13 skydns: failure to forward request "read udp 10.0.2.3:53: i/o timeout"
2016/01/22 00:23:17 skydns: failure to forward request "read udp 10.0.2.3:53: i/o timeout"
2016/01/22 00:23:21 skydns: failure to forward request "read udp 10.0.2.3:53: i/o timeout"
2016/01/22 00:23:25 skydns: failure to forward request "read udp 10.0.2.3:53: i/o timeout"
2016/01/22 00:23:29 skydns: failure to forward request "read udp 10.0.2.3:53: i/o timeout"
2016/01/22 00:23:33 skydns: failure to forward request "read udp 10.0.2.3:53: i/o timeout"
2016/01/22 00:23:37 skydns: failure to forward request "read udp 10.0.2.3:53: i/o timeout"
2016/01/22 00:23:41 skydns: failure to forward request "read udp 10.0.2.3:53: i/o timeout"
[vagrant@kubernetes-master ~]$ kubectl logs kube-dns-v10-0biid etcd --namespace=kube-system
2016/01/21 23:28:10 etcd: listening for peers on http://localhost:2380
2016/01/21 23:28:10 etcd: listening for peers on http://localhost:7001
2016/01/21 23:28:10 etcd: listening for client requests on http://127.0.0.1:2379
2016/01/21 23:28:10 etcd: listening for client requests on http://127.0.0.1:4001
2016/01/21 23:28:10 etcdserver: datadir is valid for the 2.0.1 format
2016/01/21 23:28:10 etcdserver: name = default
2016/01/21 23:28:10 etcdserver: data dir = /var/etcd/data
2016/01/21 23:28:10 etcdserver: member dir = /var/etcd/data/member
2016/01/21 23:28:10 etcdserver: heartbeat = 100ms
2016/01/21 23:28:10 etcdserver: election = 1000ms
2016/01/21 23:28:10 etcdserver: snapshot count = 10000
2016/01/21 23:28:10 etcdserver: advertise client URLs = http://127.0.0.1:2379,http://127.0.0.1:4001
2016/01/21 23:28:10 etcdserver: initial advertise peer URLs = http://localhost:2380,http://localhost:7001
2016/01/21 23:28:10 etcdserver: initial cluster = default=http://localhost:2380,default=http://localhost:7001
2016/01/21 23:28:10 etcdserver: start member 6a5871dbdd12c17c in cluster f68652439e3f8f2a
2016/01/21 23:28:10 raft: 6a5871dbdd12c17c became follower at term 0
2016/01/21 23:28:10 raft: newRaft 6a5871dbdd12c17c [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
2016/01/21 23:28:10 raft: 6a5871dbdd12c17c became follower at term 1
2016/01/21 23:28:10 etcdserver: added local member 6a5871dbdd12c17c [http://localhost:2380 http://localhost:7001] to cluster f68652439e3f8f2a
2016/01/21 23:28:12 raft: 6a5871dbdd12c17c is starting a new election at term 1
2016/01/21 23:28:12 raft: 6a5871dbdd12c17c became candidate at term 2
2016/01/21 23:28:12 raft: 6a5871dbdd12c17c received vote from 6a5871dbdd12c17c at term 2
2016/01/21 23:28:12 raft: 6a5871dbdd12c17c became leader at term 2
2016/01/21 23:28:12 raft.node: 6a5871dbdd12c17c elected leader 6a5871dbdd12c17c at term 2
2016/01/21 23:28:12 etcdserver: published {Name:default ClientURLs:[http://127.0.0.1:2379 http://127.0.0.1:4001]} to cluster f68652439e3f8f2a
(kube2sky)
I0121 23:28:19.352170 1 kube2sky.go:436] Etcd server found: http://127.0.0.1:4001
I0121 23:28:20.354200 1 kube2sky.go:503] Using https://10.254.0.1:443 for kubernetes master
I0121 23:28:20.354248 1 kube2sky.go:504] Using kubernetes API <nil>
(skydns)
kubectl logs kube-dns-v10-0biid skydns --namespace=kube-system
2016/01/22 00:27:43 skydns: falling back to default configuration, could not read from etcd: 100: Key not found (/skydns) [2]
2016/01/22 00:27:43 skydns: ready for queries on cluster.local. for tcp://0.0.0.0:53 [rcache 0]
2016/01/22 00:27:43 skydns: ready for queries on cluster.local. for udp://0.0.0.0:53 [rcache 0]
2016/01/22 00:27:49 skydns: failure to forward request "read udp 10.0.2.3:53: i/o timeout"
2016/01/22 00:27:53 skydns: failure to forward request "read udp 10.0.2.3:53: i/o timeout"
2016/01/22 00:27:57 skydns: failure to forward request "read udp 10.0.2.3:53: i/o timeout"
2016/01/22 00:28:01 skydns: failure to forward request "read udp 10.0.2.3:53: i/o timeout"
2016/01/22 00:28:05 skydns: failure to forward request "read udp 10.0.2.3:53: i/o timeout"
2016/01/22 00:28:09 skydns: failure to forward request "read udp 10.0.2.3:53: i/o timeout"
2016/01/22 00:28:13 skydns: failure to forward request "read udp 10.0.2.3:53: i/o timeout"
2016/01/22 00:28:17 skydns: failure to forward request "read udp 10.0.2.3:53: i/o timeout"
The service endpoint IP does NOT seem to be getting set:
kubectl describe svc kube-dns --namespace=kube-system
Name: kube-dns
Namespace: kube-system
Labels: k8s-app=kube-dns,kubernetes.io/cluster-service=true,kubernetes.io/name=KubeDNS
Selector: k8s-app=kube-dns
Type: ClusterIP
IP: 10.254.0.10
Port: dns 53/UDP
Endpoints:
Port: dns-tcp 53/TCP
Endpoints:
Session Affinity: None
No events.
I have double checked the serviceaccounts and that all seems configured correctly:
kubectl get secrets --all-namespaces
NAMESPACE NAME TYPE DATA AGE
default default-token-z71xj kubernetes.io/service-account-token 2 1h
kube-system default-token-wce74 kubernetes.io/service-account-token 2 1h
kube-system token-system-controller-manager-master Opaque 1 1h
kube-system token-system-dns Opaque 1 1h
kube-system token-system-kubectl-master Opaque 1 1h
kube-system token-system-kubelet-minion-1 Opaque 1 1h
kube-system token-system-logging Opaque 1 1h
kube-system token-system-monitoring Opaque 1 1h
kube-system token-system-proxy-minion-1 Opaque 1 1h
kube-system token-system-scheduler-master Opaque 1 1h
The default secret for kube-system namespaces which matches the one the POD is using.
kubectl describe secrets default-token-wce74 --namespace=kube-system
Name: default-token-wce74
Namespace: kube-system
Labels: <none>
Annotations: kubernetes.io/service-account.name=default,kubernetes.io/service-account.uid=70da0a10-c096-11e5-aa7b-08002771c788
Type: kubernetes.io/service-account-token
Data
====
token: eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkZWZhdWx0LXRva2VuLXdjZTc0Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImRlZmF1bHQiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiI3MGRhMGExMC1jMDk2LTExZTUtYWE3Yi0wODAwMjc3MWM3ODgiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06ZGVmYXVsdCJ9.sykf8qmh9ekAEHnSPAMLPz04zebvDJhb72A2YC1Y8_BXoA57U7KRAVDVyyxQHrEUSlHsSfxzqHHOcLniPQbqWZxc0bK4taV6zdBKIgndEthz0HGJQJdfZJKxurP5dhI6TOIpeLYpUE6BN6ubsVQiJksVLK_Lfq_c1posqAUi8eXD-KsqRDA98JMUZyirRGRXzZfF7-KscIqys7AiHAURHHwDibjmXIdYKBpDwc6hOIATpS3r6rLj30R1hNYy4u2GkpNsIYo83zIt515rnfCH9Yq1syT6-qho0SaPnj3us-uT8ZXF0x_7SlChV9Wx5Mo6kW3EHg6-A6q6m3R0KlsHjQ
ca.crt: 1387 bytes
I have also kubectl exec
into the kube2sky container and the ca.crt matches the one on the server.
kubectl describe svc kube-dns --namespace=kube-system
and it says the endpoint was 18.16.88.6, so I curled via:curl https://18.16.88.6
and get a connection refused. I'm guessing this means some authorization isn't setup correctly? I also attempted to validate DNS but it gives me a "nslookup: can't resolve 'kubernetes.default'". – tbs