I have a GKE cluster which has been running fine up until recently. Now I see a whole bunch of Kubernetes Workloads showing as offline with the following error msg:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 6m23s default-scheduler Warning Failed 5m39s (x3 over 6m22s) kubelet, gke-platsol-bots-staging-default-pool-f489f2f3-rjrq Error: ErrImagePull
Normal BackOff 5m2s (x7 over 6m21s) kubelet, gke-platsol-bots-staging-default-pool-f489f2f3-rjrq Back-off pulling image "us.gcr.io/project/poc-app-bot@sha256:b99b5fb1b77407ade49d9bf42a94919e90422fee26c1a46ec6247370bd96c4d8"
Normal Pulling 4m49s (x4 over 6m22s) kubelet, gke-platsol-bots-staging-default-pool-f489f2f3-rjrq pulling image "us.gcr.io/project/poc-app-bot@sha256:b99b5fb1b77407ade49d9bf42a94919e90422fee26c1a46ec6247370bd96c4d8"
Warning Failed 81s (x22 over 6m21s) kubelet, gke-platsol-bots-staging-default-pool-f489f2f3-rjrq Error: ImagePullBackOff
Not sure what could have changed to cause this issue.
This is the ouput of kubectl
Name: project-5dddbd66b5-vpw8q
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: gke-platsol-bots-staging-default-pool-f489f2f3-rjrq/10.x.x.x
Start Time: Wed, 18 Sep 2019 16:48:23 +0100
Labels: app=bot
pod-template-hash=5dddbd66b5
Annotations: kubernetes.io/limit-ranger: LimitRanger plugin set: cpu request for container project
Status: Pending
IP: 10.20.1.9
Controlled By: ReplicaSet/bot-5dddbd66b5
Containers:
project:
Container ID:
Image: us.gcr.io/project/project@sha256:b99b5fb1b77407ade49d9bf42a94919e90422fee26c1a46ec6247370bd96c4d8
Image ID:
Port: 8080/TCP
Host Port: 0/TCP
State: Waiting
Reason: ImagePullBackOff
Ready: False
Restart Count: 0
Requests:
cpu: 100m
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-99cns:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-99cns
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Failed 4m38s (x793 over 3h4m) kubelet, gke-platsol-bots-staging-default-pool-f489f2f3-rjrq Error: ImagePullBackOff
Below is what i have in my YAML definition for the deployment. I have not defined a secret as one was not required to pull the image from Google Container Registry,
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "3"
kubectl.kubernetes.io/last-applied-configuration: |
<redacted annotations>
creationTimestamp: 2019-06-06T08:37:01Z
generation: 3
labels:
app: project
name: bot
namespace: default
resourceVersion: "68945490"
selfLink: /apis/apps/v1/namespaces/default/deployments/bot
uid: 412ce711-8836-11e9-905f-42010a8e016c
image: us.gcr.io/project/app-bot@sha256:b99b5fb1b77407ade49d9bf42a94919e90422fee26c1a46ec6247370bd96c4d8
imagePullPolicy: IfNotPresent
Okay so I followed this guide to patch the service account with a "secret" when pulling images from GCR https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
I SSH onto a single node and can pull an image for one Application successfully,
vinay@cloudshell:~ (project-id)$ docker pull us.gcr.io/project-id/project2-bot@sha256:9817462c743a93bb9206e4b8685
5322f731a768dca18e26b8bfc39b0cc886d31
sha256:9817462c743a93bb9206e4b86855322f731a768dca18e26b8bfc39b0cc886d31: Pulling from project-id/project2-bot
092586df9206: Pull complete
ef599477fae0: Pull complete
4530c6472b5d: Pull complete
d34d61487075: Pull complete
272f46008219: Pull complete
12ff6ccfe7a6: Pull complete
f26b99e1adb1: Pull complete
bb50901cd579: Pull complete
64a286652062: Pull complete
283785ced197: Pull complete
ed5a2062edd6: Pull complete
Digest: sha256:9817462c743a93bb9206e4b86855322f731a768dca18e26b8bfc39b0cc886d31
Status: Downloaded newer image for us.gcr.io/project-id/project2-bot@sha256:9817462c743a93bb9206e4b86855322f731a768dca18e26b8
bfc39b0cc886d31
us.gcr.io/project-id/project2-bot@sha256:9817462c743a93bb9206e4b86855322f731a768dca18e26b8bfc39b0cc886d31
But this application seems to throw an error,
vinay@cloudshell:~ (project-id)$ docker pull us.gcr.io/project-id/project1-plug@sha256:c53ac1c536a1187ce940f9221730cc0eae3103f4313033659e2162a70bc66c59
sha256:c53ac1c536a1187ce940f9221730cc0eae3103f4313033659e2162a70bc66c59: Pulling from project-id/project1-plug
a4d8138d0f6b: Pulling fs layer
dbdc36973392: Pulling fs layer
f59d6d019dd5: Pulling fs layer
aaef3e026258: Waiting
5e86b04a4500: Waiting
1a6643a2873a: Waiting
2ad1e30fc17c: Waiting
ddb5baaf3393: Waiting
0a7edc889b3c: Waiting
31a1f16c256b: Waiting
172a500f7b4d: Waiting
error pulling image configuration: unknown blob