2
votes

I've set up a Kubernetes cluster with three nodes, i get all my nodes status ready, but the scheduler seems not find one of them. How could this happen.

[root@master1 app]# kubectl get nodes
NAME          LABELS                                         STATUS    AGE
172.16.0.44   kubernetes.io/hostname=172.16.0.44,pxc=node1   Ready     8d
172.16.0.45   kubernetes.io/hostname=172.16.0.45             Ready     8d
172.16.0.46   kubernetes.io/hostname=172.16.0.46             Ready     8d

I use nodeSelect in my RC file like thie:

  nodeSelector:
pxc: node1

describe the rc:

Name:       mongo-controller
Namespace:  kube-system
Image(s):   mongo
Selector:   k8s-app=mongo
Labels:     k8s-app=mongo
Replicas:   1 current / 1 desired
Pods Status:    0 Running / 1 Waiting / 0 Succeeded / 0 Failed
Volumes:
  mongo-persistent-storage:
    Type:   HostPath (bare host directory volume)
    Path:   /k8s/mongodb
Events:
  FirstSeen LastSeen    Count   From                SubobjectPath   Reason          Message
  ───────── ────────    ─────   ────                ─────────────   ──────          ───────
  25m       25m     1   {replication-controller }           SuccessfulCreate    Created pod: mongo-controller-0wpwu

get pods to be pending:

[root@master1 app]# kubectl get pods mongo-controller-0wpwu --namespace=kube-system
NAME                     READY     STATUS    RESTARTS   AGE
mongo-controller-0wpwu   0/1       Pending   0          27m

describe pod mongo-controller-0wpwu:

[root@master1 app]# kubectl describe pod mongo-controller-0wpwu --namespace=kube-system
Name:               mongo-controller-0wpwu
Namespace:          kube-system
Image(s):           mongo
Node:               /
Labels:             k8s-app=mongo
Status:             Pending
Reason:
Message:
IP:
Replication Controllers:    mongo-controller (1/1 replicas created)
Containers:
  mongo:
    Container ID:
    Image:      mongo
    Image ID:
    QoS Tier:
      cpu:      BestEffort
      memory:       BestEffort
    State:      Waiting
    Ready:      False
    Restart Count:  0
    Environment Variables:
Volumes:
  mongo-persistent-storage:
    Type:   HostPath (bare host directory volume)
    Path:   /k8s/mongodb
  default-token-7qjcu:
    Type:   Secret (a secret that should populate this volume)
    SecretName: default-token-7qjcu
Events:
  FirstSeen LastSeen    Count   From            SubobjectPath   Reason          Message
  ───────── ────────    ─────   ────            ─────────────   ──────          ───────
  22m       37s     12  {default-scheduler }            FailedScheduling    pod (mongo-controller-0wpwu) failed to fit in any node
fit failure on node (172.16.0.46): MatchNodeSelector
fit failure on node (172.16.0.45): MatchNodeSelector

  27m   9s  67  {default-scheduler }        FailedScheduling    pod (mongo-controller-0wpwu) failed to fit in any node
fit failure on node (172.16.0.45): MatchNodeSelector
fit failure on node (172.16.0.46): MatchNodeSelector

See the ip list in events, The 172.16.0.44 seems not seen by the scheduler? How could the happen?

describe the node 172.16.0.44

[root@master1 app]# kubectl describe nodes --namespace=kube-system
Name:           172.16.0.44
Labels:         kubernetes.io/hostname=172.16.0.44,pxc=node1
CreationTimestamp:  Wed, 30 Mar 2016 15:58:47 +0800
Phase:
Conditions:
  Type      Status      LastHeartbeatTime           LastTransitionTime          Reason          Message
  ────      ──────      ─────────────────           ──────────────────          ──────          ───────
  Ready     True        Fri, 08 Apr 2016 12:18:01 +0800     Fri, 08 Apr 2016 11:18:52 +0800     KubeletReady        kubelet is posting ready status
  OutOfDisk     Unknown     Wed, 30 Mar 2016 15:58:47 +0800     Thu, 07 Apr 2016 17:38:50 +0800     NodeStatusNeverUpdated  Kubelet never posted node status.
Addresses:  172.16.0.44,172.16.0.44
Capacity:
 cpu:       2
 memory:    7748948Ki
 pods:      40
System Info:
 Machine ID:            45461f76679f48ee96e95da6cc798cc8
 System UUID:           2B850D4F-953C-4C20-B182-66E17D5F6461
 Boot ID:           40d2cd8d-2e46-4fef-92e1-5fba60f57965
 Kernel Version:        3.10.0-123.9.3.el7.x86_64
 OS Image:          CentOS Linux 7 (Core)
 Container Runtime Version: docker://1.10.1
 Kubelet Version:       v1.2.0
 Kube-Proxy Version:        v1.2.0
ExternalID:         172.16.0.44
Non-terminated Pods:        (1 in total)
  Namespace         Name                    CPU Requests    CPU Limits  Memory Requests Memory Limits
  ─────────         ────                    ────────────    ──────────  ─────────────── ─────────────
  kube-system           kube-registry-proxy-172.16.0.44     100m (5%)   100m (5%)   50Mi (0%)   50Mi (0%)
Allocated resources:
  (Total limits may be over 100%, i.e., overcommitted. More info: http://releases.k8s.io/HEAD/docs/user-guide/compute-resources.md)
  CPU Requests  CPU Limits  Memory Requests Memory Limits
  ────────────  ──────────  ─────────────── ─────────────
  100m (5%) 100m (5%)   50Mi (0%)   50Mi (0%)
Events:
  FirstSeen LastSeen    Count   From            SubobjectPath   Reason      Message
  ───────── ────────    ─────   ────            ─────────────   ──────      ───────
  59m       59m     1   {kubelet 172.16.0.44}           Starting    Starting kubelet.

Ssh login 44, i get the disk space is free(i also remove some docker images and containers):

[root@iZ25dqhvvd0Z ~]# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda1       40G  2.6G   35G   7% /
devtmpfs        3.9G     0  3.9G   0% /dev
tmpfs           3.7G     0  3.7G   0% /dev/shm
tmpfs           3.7G  143M  3.6G   4% /run
tmpfs           3.7G     0  3.7G   0% /sys/fs/cgroup
/dev/xvdb        40G  361M   37G   1% /k8s

Still docker logs scheduler(v1.3.0-alpha.1 ) get this

E0408 05:28:42.679448       1 factory.go:387] Error scheduling kube-system mongo-controller-0wpwu: pod (mongo-controller-0wpwu) failed to fit in any node
fit failure on node (172.16.0.45): MatchNodeSelector
fit failure on node (172.16.0.46): MatchNodeSelector
; retrying
I0408 05:28:42.679577       1 event.go:216] Event(api.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"mongo-controller-0wpwu", UID:"2d0f0844-fd3c-11e5-b531-00163e000727", APIVersion:"v1", ResourceVersion:"634139", FieldPath:""}): type: 'Warning' reason: 'FailedScheduling' pod (mongo-controller-0wpwu) failed to fit in any node
fit failure on node (172.16.0.45): MatchNodeSelector
fit failure on node (172.16.0.46): MatchNodeSelector
3
This solution may help you. How to restart kubernetes nodes?CHENJIAN

3 Answers

2
votes

Thanks for your replay Robert. i got this resolve by doing below:

kubectl delete rc
kubectl delete node 172.16.0.44
stop kubelet in 172.16.0.44
rm -rf /k8s/*
restart kubelet

Now the node is ready, and out of disk is gone.

Name:           172.16.0.44
Labels:         kubernetes.io/hostname=172.16.0.44,pxc=node1
CreationTimestamp:  Fri, 08 Apr 2016 15:14:51 +0800
Phase:
Conditions:
  Type      Status  LastHeartbeatTime           LastTransitionTime          Reason      Message
  ────      ──────  ─────────────────           ──────────────────          ──────      ───────
  Ready     True    Fri, 08 Apr 2016 15:25:33 +0800     Fri, 08 Apr 2016 15:14:50 +0800     KubeletReady    kubelet is posting ready status
Addresses:  172.16.0.44,172.16.0.44
Capacity:
 cpu:       2
 memory:    7748948Ki
 pods:      40
System Info:
 Machine ID:            45461f76679f48ee96e95da6cc798cc8
 System UUID:           2B850D4F-953C-4C20-B182-66E17D5F6461
 Boot ID:           40d2cd8d-2e46-4fef-92e1-5fba60f57965
 Kernel Version:        3.10.0-123.9.3.el7.x86_64
 OS Image:          CentOS Linux 7 (Core)

I found this https://github.com/kubernetes/kubernetes/issues/4135, but still don't know why my disk space is free and kubelet thinks it is out of disk...

1
votes

The reason the scheduler failed is that there wasn't space to fit the pod onto the node. If you look at the conditions for your node, it says that the OutOfDisk condition is Unknown. The scheduler is probably not willing to place on pod onto a node that it thinks doesn't have available disk space.

0
votes

We had the same issue in AWS when they changed DNS from IP=DNS name to IP=IP.eu-central: Nodes showed ready but where not reachable via their name.