3
votes

I've been working with a 6 node cluster for the last few weeks without issue. Earlier today we ran into an open file issue (https://github.com/kubernetes/kubernetes/pull/12443/files) and I patched and restarted kube-proxy.

Since then, all rc deployed pods to ALL BUT node-01 get stuck in pending state and there log messages stating the cause.

Looking at the docker daemon on the nodes, the containers in the pod are actually running and a delete of the rc removes them. It appears to be some sort of callback issue between the state according to kubelet and the kube-apiserver.

Cluster is running v1.0.3

Here's an example of the state

docker run --rm -it lachie83/kubectl:prod get pods --namespace=kube-system -o wide
NAME                READY     STATUS    RESTARTS   AGE       NODE
kube-dns-v8-i0yac   0/4       Pending   0          4s        10.1.1.35
kube-dns-v8-jti2e   0/4       Pending   0          4s        10.1.1.34

get events

Wed, 16 Sep 2015 06:25:42 +0000   Wed, 16 Sep 2015 06:25:42 +0000   1         kube-dns-v8                       ReplicationController                                                successfulCreate   {replication-controller }   Created pod: kube-dns-v8-i0yac
Wed, 16 Sep 2015 06:25:42 +0000   Wed, 16 Sep 2015 06:25:42 +0000   1         kube-dns-v8-i0yac                 Pod                                                                  scheduled          {scheduler }                Successfully assigned kube-dns-v8-i0yac to 10.1.1.35
Wed, 16 Sep 2015 06:25:42 +0000   Wed, 16 Sep 2015 06:25:42 +0000   1         kube-dns-v8-jti2e                 Pod                                                                  scheduled          {scheduler }                Successfully assigned kube-dns-v8-jti2e to 10.1.1.34
Wed, 16 Sep 2015 06:25:42 +0000   Wed, 16 Sep 2015 06:25:42 +0000   1         kube-dns-v8                       ReplicationController                                                successfulCreate   {replication-controller }   Created pod: kube-dns-v8-jti2e

scheduler log

I0916 06:25:42.897814   10076 event.go:203] Event(api.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"kube-dns-v8-jti2e", UID:"c1cafebe-5c3b-11e5-b3c4-020443b6797d", APIVersion:"v1", ResourceVersion:"670117", FieldPath:""}): reason: 'scheduled' Successfully assigned kube-dns-v8-jti2e to 10.1.1.34
I0916 06:25:42.904195   10076 event.go:203] Event(api.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"kube-dns-v8-i0yac", UID:"c1cafc69-5c3b-11e5-b3c4-020443b6797d", APIVersion:"v1", ResourceVersion:"670118", FieldPath:""}): reason: 'scheduled' Successfully assigned kube-dns-v8-i0yac to 10.1.1.35

tailing kubelet log file during pod create

tail -f kubelet.kube-node-03.root.log.INFO.20150916-060744.10668
I0916 06:25:04.448916   10668 config.go:253] Setting pods for source file : {[] 0 file}
I0916 06:25:24.449253   10668 config.go:253] Setting pods for source file : {[] 0 file}
I0916 06:25:44.449522   10668 config.go:253] Setting pods for source file : {[] 0 file}
I0916 06:26:04.449774   10668 config.go:253] Setting pods for source file : {[] 0 file}
I0916 06:26:24.450400   10668 config.go:253] Setting pods for source file : {[] 0 file}
I0916 06:26:44.450995   10668 config.go:253] Setting pods for source file : {[] 0 file}
I0916 06:27:04.451501   10668 config.go:253] Setting pods for source file : {[] 0 file}
I0916 06:27:24.451910   10668 config.go:253] Setting pods for source file : {[] 0 file}
I0916 06:27:44.452511   10668 config.go:253] Setting pods for source file : {[] 0 file}

kubelet process

root@kube-node-03:/var/log/kubernetes# ps -ef | grep kubelet
root     10668     1  1 06:07 ?        00:00:13 /opt/bin/kubelet --address=10.1.1.34 --port=10250 --hostname_override=10.1.1.34 --api_servers=https://kube-master-01.sj.lithium.com:6443 --logtostderr=false --log_dir=/var/log/kubernetes --cluster_dns=10.1.2.53 --config=/etc/kubelet/conf --cluster_domain=prod-kube-sjc1-1.internal --v=4 --tls-cert-file=/etc/kubelet/certs/kubelet.pem --tls-private-key-file=/etc/kubelet/certs/kubelet-key.pem

node list

docker run --rm -it lachie83/kubectl:prod get nodes
NAME            LABELS                                             STATUS
10.1.1.30   kubernetes.io/hostname=10.1.1.30,name=node-1   Ready
10.1.1.32   kubernetes.io/hostname=10.1.1.32,name=node-2   Ready
10.1.1.34   kubernetes.io/hostname=10.1.1.34,name=node-3   Ready
10.1.1.35   kubernetes.io/hostname=10.1.1.35,name=node-4   Ready
10.1.1.42   kubernetes.io/hostname=10.1.1.42,name=node-5   Ready
10.1.1.43   kubernetes.io/hostname=10.1.1.43,name=node-6   Ready
2

2 Answers

3
votes

The issue turned out to be an MTU issue between the node and the master. Once that was fixed the problem was resolved.

0
votes

Looks like you were building your cluster from scratch. Have you run conformance test against your cluster yet? If no, could you please run it and the detail information can be found at:

https://github.com/kubernetes/kubernetes/blob/e8009e828c864a46bf2e1d5c7dab8ef413c8bbe5/hack/conformance-test.sh

The conformance test should failed, or at least give us more information on your cluster setup. Please post the test result somewhere, so that we can diagnose your problem more.

The problem most likely your kubelet and your kube-apiserver don't agree upon the node name here. And I also noticed that you are using hostname_override too.