2
votes

I'm having a problem with my GKE cluster, all the pods are stuck with ContainerCreating status. When I run the kubectl get events I see this error:

Failed create pod sandbox: rpc error: code = Unknown desc = Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

Anyone knows what the hell is happening? I can't find this solution anywhere.

EDIT I saw this post https://github.com/kubernetes/kubernetes/issues/44273 saying that the GKE instances that are small than the default google instance for GKE(n1-standard-1) can have network problems. So I changed my instances to the default type, but without success. Here are my node and pod descriptions:

Name:               gke-aditum-k8scluster--pool-nodes-dev-500ebc8b-bgb6
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/fluentd-ds-ready=true
                    beta.kubernetes.io/instance-type=n1-standard-1
                    beta.kubernetes.io/os=linux
                    cloud.google.com/gke-nodepool=pool-nodes-dev
                    failure-domain.beta.kubernetes.io/region=southamerica-east1
                    failure-domain.beta.kubernetes.io/zone=southamerica-east1-a
                    kubernetes.io/hostname=gke-aditum-k8scluster--pool-nodes-dev-500ebc8b-bgb6
Annotations:        node.alpha.kubernetes.io/ttl=0
                    volumes.kubernetes.io/controller-managed-attach-detach=true
CreationTimestamp:  Thu, 27 Sep 2018 20:27:47 -0300
Taints:             <none>
Unschedulable:      false
Conditions:
  Type                          Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                          ------  -----------------                 ------------------                ------                       -------
  KernelDeadlock                False   Fri, 28 Sep 2018 09:58:58 -0300   Thu, 27 Sep 2018 20:27:16 -0300   KernelHasNoDeadlock          kernel has no deadlock
  FrequentUnregisterNetDevice   False   Fri, 28 Sep 2018 09:58:58 -0300   Thu, 27 Sep 2018 20:32:18 -0300   UnregisterNetDevice          node is functioning properly
  NetworkUnavailable            False   Thu, 27 Sep 2018 20:27:48 -0300   Thu, 27 Sep 2018 20:27:48 -0300   RouteCreated                 NodeController create implicit route
  OutOfDisk                     False   Fri, 28 Sep 2018 09:59:03 -0300   Thu, 27 Sep 2018 20:27:47 -0300   KubeletHasSufficientDisk     kubelet has sufficient disk space available
  MemoryPressure                False   Fri, 28 Sep 2018 09:59:03 -0300   Thu, 27 Sep 2018 20:27:47 -0300   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure                  False   Fri, 28 Sep 2018 09:59:03 -0300   Thu, 27 Sep 2018 20:27:47 -0300   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure                   False   Fri, 28 Sep 2018 09:59:03 -0300   Thu, 27 Sep 2018 20:27:47 -0300   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                         True    Fri, 28 Sep 2018 09:59:03 -0300   Thu, 27 Sep 2018 20:28:07 -0300   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  InternalIP:  10.0.0.2
  ExternalIP:
  Hostname:    gke-aditum-k8scluster--pool-nodes-dev-500ebc8b-bgb6
Capacity:
 cpu:                1
 ephemeral-storage:  98868448Ki
 hugepages-2Mi:      0
 memory:             3787608Ki
 pods:               110
Allocatable:
 cpu:                940m
 ephemeral-storage:  47093746742
 hugepages-2Mi:      0
 memory:             2702168Ki
 pods:               110
System Info:
 Machine ID:                 1e8e0ecad8f5cc7fb5851bc64513d40c
 System UUID:                1E8E0ECA-D8F5-CC7F-B585-1BC64513D40C
 Boot ID:                    971e5088-6bc1-4151-94bf-b66c6c7ee9a3
 Kernel Version:             4.14.56+
 OS Image:                   Container-Optimized OS from Google
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://17.3.2
 Kubelet Version:            v1.10.7-gke.2
 Kube-Proxy Version:         v1.10.7-gke.2
PodCIDR:                     10.0.32.0/24
ProviderID:                  gce://aditumpay/southamerica-east1-a/gke-aditum-k8scluster--pool-nodes-dev-500ebc8b-bgb6
Non-terminated Pods:         (11 in total)
  Namespace                  Name                                                              CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ---------                  ----                                                              ------------  ----------  ---------------  -------------
  kube-system                event-exporter-v0.2.1-5f5b89fcc8-xsvmg                            0 (0%)        0 (0%)      0 (0%)           0 (0%)
  kube-system                fluentd-gcp-scaler-7c5db745fc-vttc9                               0 (0%)        0 (0%)      0 (0%)           0 (0%)
  kube-system                fluentd-gcp-v3.1.0-sz8r8                                          0 (0%)        0 (0%)      0 (0%)           0 (0%)
  kube-system                heapster-v1.5.3-75486b456f-sj7k8                                  138m (14%)    138m (14%)  301856Ki (11%)   301856Ki (11%)
  kube-system                kube-dns-788979dc8f-99xvh                                         260m (27%)    0 (0%)      110Mi (4%)       170Mi (6%)
  kube-system                kube-dns-788979dc8f-9sz2b                                         260m (27%)    0 (0%)      110Mi (4%)       170Mi (6%)
  kube-system                kube-dns-autoscaler-79b4b844b9-6s8x2                              20m (2%)      0 (0%)      10Mi (0%)        0 (0%)
  kube-system                kube-proxy-gke-aditum-k8scluster--pool-nodes-dev-500ebc8b-bgb6    100m (10%)    0 (0%)      0 (0%)           0 (0%)
  kube-system                kubernetes-dashboard-598d75cb96-6nhcd                             50m (5%)      100m (10%)  100Mi (3%)       300Mi (11%)
  kube-system                l7-default-backend-5d5b9874d5-8wk6h                               10m (1%)      10m (1%)    20Mi (0%)        20Mi (0%)
  kube-system                metrics-server-v0.2.1-7486f5bd67-fvddz                            53m (5%)      148m (15%)  154Mi (5%)       404Mi (15%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource  Requests        Limits
  --------  --------        ------
  cpu       891m (94%)      396m (42%)
  memory    817952Ki (30%)  1391392Ki (51%)
Events:     <none>

The other node:

Name:               gke-aditum-k8scluster--pool-nodes-dev-500ebc8b-m7bz
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/fluentd-ds-ready=true
                    beta.kubernetes.io/instance-type=n1-standard-1
                    beta.kubernetes.io/os=linux
                    cloud.google.com/gke-nodepool=pool-nodes-dev
                    failure-domain.beta.kubernetes.io/region=southamerica-east1
                    failure-domain.beta.kubernetes.io/zone=southamerica-east1-a
                    kubernetes.io/hostname=gke-aditum-k8scluster--pool-nodes-dev-500ebc8b-m7bz
Annotations:        node.alpha.kubernetes.io/ttl=0
                    volumes.kubernetes.io/controller-managed-attach-detach=true
CreationTimestamp:  Thu, 27 Sep 2018 20:30:05 -0300
Taints:             <none>
Unschedulable:      false
Conditions:
  Type                          Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                          ------  -----------------                 ------------------                ------                       -------
  KernelDeadlock                False   Fri, 28 Sep 2018 10:11:03 -0300   Thu, 27 Sep 2018 20:29:34 -0300   KernelHasNoDeadlock          kernel has no deadlock
  FrequentUnregisterNetDevice   False   Fri, 28 Sep 2018 10:11:03 -0300   Thu, 27 Sep 2018 20:34:36 -0300   UnregisterNetDevice          node is functioning properly
  NetworkUnavailable            False   Thu, 27 Sep 2018 20:30:06 -0300   Thu, 27 Sep 2018 20:30:06 -0300   RouteCreated                 NodeController create implicit route
  OutOfDisk                     False   Fri, 28 Sep 2018 10:11:49 -0300   Thu, 27 Sep 2018 20:30:05 -0300   KubeletHasSufficientDisk     kubelet has sufficient disk space available
  MemoryPressure                False   Fri, 28 Sep 2018 10:11:49 -0300   Thu, 27 Sep 2018 20:30:05 -0300   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure                  False   Fri, 28 Sep 2018 10:11:49 -0300   Thu, 27 Sep 2018 20:30:05 -0300   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure                   False   Fri, 28 Sep 2018 10:11:49 -0300   Thu, 27 Sep 2018 20:30:05 -0300   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                         True    Fri, 28 Sep 2018 10:11:49 -0300   Thu, 27 Sep 2018 20:30:25 -0300   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  InternalIP:  10.0.0.3
  ExternalIP:
  Hostname:    gke-aditum-k8scluster--pool-nodes-dev-500ebc8b-m7bz
Capacity:
 cpu:                1
 ephemeral-storage:  98868448Ki
 hugepages-2Mi:      0
 memory:             3787608Ki
 pods:               110
Allocatable:
 cpu:                940m
 ephemeral-storage:  47093746742
 hugepages-2Mi:      0
 memory:             2702168Ki
 pods:               110
System Info:
 Machine ID:                 f1d5cf2a0b2c5472cf6509778a7941a7
 System UUID:                F1D5CF2A-0B2C-5472-CF65-09778A7941A7
 Boot ID:                    f35bebb8-acd7-4a2f-95d6-76604638aef9
 Kernel Version:             4.14.56+
 OS Image:                   Container-Optimized OS from Google
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://17.3.2
 Kubelet Version:            v1.10.7-gke.2
 Kube-Proxy Version:         v1.10.7-gke.2
PodCIDR:                     10.0.33.0/24
ProviderID:                  gce://aditumpay/southamerica-east1-a/gke-aditum-k8scluster--pool-nodes-dev-500ebc8b-m7bz
Non-terminated Pods:         (7 in total)
  Namespace                  Name                                                              CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ---------                  ----                                                              ------------  ----------  ---------------  -------------
  default                    aditum-payment-7d966c494c-wpk2t                                   100m (10%)    0 (0%)      0 (0%)           0 (0%)
  default                    aditum-portal-dev-5c69d76bb6-n5d5b                                100m (10%)    0 (0%)      0 (0%)           0 (0%)
  default                    aditum-vtexapi-5c758fcfb7-rhvsn                                   100m (10%)    0 (0%)      0 (0%)           0 (0%)
  default                    admin-mongo-dev-7d9f7f7d46-rrj42                                  100m (10%)    0 (0%)      0 (0%)           0 (0%)
  default                    mongod-0                                                          200m (21%)    0 (0%)      200Mi (7%)       0 (0%)
  kube-system                fluentd-gcp-v3.1.0-pgwfx                                          0 (0%)        0 (0%)      0 (0%)           0 (0%)
  kube-system                kube-proxy-gke-aditum-k8scluster--pool-nodes-dev-500ebc8b-m7bz    100m (10%)    0 (0%)      0 (0%)           0 (0%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource  Requests    Limits
  --------  --------    ------
  cpu       700m (74%)  0 (0%)
  memory    200Mi (7%)  0 (0%)
Events:     <none>

All the cluster's pods are stucked.

NAMESPACE     NAME                                                             READY     STATUS              RESTARTS   AGE
default       aditum-payment-7d966c494c-wpk2t                                  0/1       ContainerCreating   0          13h
default       aditum-portal-dev-5c69d76bb6-n5d5b                               0/1       ContainerCreating   0          13h
default       aditum-vtexapi-5c758fcfb7-rhvsn                                  0/1       ContainerCreating   0          13h
default       admin-mongo-dev-7d9f7f7d46-rrj42                                 0/1       ContainerCreating   0          13h
default       mongod-0                                                         0/1       ContainerCreating   0          13h
kube-system   event-exporter-v0.2.1-5f5b89fcc8-xsvmg                           0/2       ContainerCreating   0          13h
kube-system   fluentd-gcp-scaler-7c5db745fc-vttc9                              0/1       ContainerCreating   0          13h
kube-system   fluentd-gcp-v3.1.0-pgwfx                                         0/2       ContainerCreating   0          16h
kube-system   fluentd-gcp-v3.1.0-sz8r8                                         0/2       ContainerCreating   0          16h
kube-system   heapster-v1.5.3-75486b456f-sj7k8                                 0/3       ContainerCreating   0          13h
kube-system   kube-dns-788979dc8f-99xvh                                        0/4       ContainerCreating   0          13h
kube-system   kube-dns-788979dc8f-9sz2b                                        0/4       ContainerCreating   0          13h
kube-system   kube-dns-autoscaler-79b4b844b9-6s8x2                             0/1       ContainerCreating   0          13h
kube-system   kube-proxy-gke-aditum-k8scluster--pool-nodes-dev-500ebc8b-bgb6   0/1       ContainerCreating   0          13h
kube-system   kube-proxy-gke-aditum-k8scluster--pool-nodes-dev-500ebc8b-m7bz   0/1       ContainerCreating   0          13h
kube-system   kubernetes-dashboard-598d75cb96-6nhcd                            0/1       ContainerCreating   0          13h
kube-system   l7-default-backend-5d5b9874d5-8wk6h                              0/1       ContainerCreating   0          13h
kube-system   metrics-server-v0.2.1-7486f5bd67-fvddz                           0/2       ContainerCreating   0          13h

A stucked pod.

Name:           aditum-payment-7d966c494c-wpk2t
Namespace:      default
Node:           gke-aditum-k8scluster--pool-nodes-dev-500ebc8b-m7bz/10.0.0.3
Start Time:     Thu, 27 Sep 2018 20:30:47 -0300
Labels:         io.kompose.service=aditum-payment
                pod-template-hash=3852270507
Annotations:    kubernetes.io/limit-ranger=LimitRanger plugin set: cpu request for container aditum-payment
Status:         Pending
IP:
Controlled By:  ReplicaSet/aditum-payment-7d966c494c
Containers:
  aditum-payment:
    Container ID:
    Image:          gcr.io/aditumpay/aditumpaymentwebapi:latest
    Image ID:
    Port:           5000/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:  100m
    Environment:
      CONNECTIONSTRING:  <set to the key 'CONNECTIONSTRING' of config map 'aditum-payment-config'>  Optional: false
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-qsc9k (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          False
  PodScheduled   True
Volumes:
  default-token-qsc9k:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-qsc9k
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                  Age                  From                                                          Message
  ----     ------                  ----                 ----                                                          -------
  Warning  FailedCreatePodSandBox  3m (x1737 over 13h)  kubelet, gke-aditum-k8scluster--pool-nodes-dev-500ebc8b-m7bz  Failed create pod sandbox: rpc error: code = Unknown desc = Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

Thanks!

2
Can you post kubectl describe nodes? also kubectl describe pod <pod-not-creating?Rico
Have you enabled private google access for your subnet as described here: cloud.google.com/kubernetes-engine/docs/how-to/private-clusters?Jukka
@Jukka yes, I did.Artur Fernandes
Is the cluster VPC-native? I just tested creating a VPC-native private cluster into a subnetwork with private google access enabled and all kube-system pods started up just fine.Jukka
Yes, I know. I did that. The cluster was working just fine until yesterday. This problem started out of the blue.Artur Fernandes

2 Answers

2
votes

Sorry for taking to long to respond. It was a very silly problem. After I reach the google cloud support, I notice that my NAT machine was not working properly. The PrivateAccess route was passing thougth my NAT. Thanks everyone for the help.

0
votes

In addition of the description of your nodes, it ca depend from where you are launching them.

As mentioned in kubernetes/minikube issue 2148 or kubernetes/minikube issue 3142, that won't work from China.

The workaround in that case is to find another source, pull it and tag it:

minikube ssh \
"docker pull registry.cn-hangzhou.aliyuncs.com/google-containers/pause-amd64:3.0
docker tag registry.cn-hangzhou.aliyuncs.com/google-containers/pause-amd64:3.0 gcr.io/google_containers/pause-amd64:3.0"