3
votes

This one is a real head-scratcher, because everything had worked fine for years until yesterday. I have a google cloud account and the billing is set up correctly. I have private images in my GCR registry which I can 'docker pull' and 'docker push' from my laptop (MacBook Pro with Big Sur 11.4) with no problems.

The problem I detail here started happening yesterday after I deleted a project in the google cloud console, then created it again from scratch with the same name. The previous project had no problem pulling GCR images, the new one couldn't pull the same images. I have now used the cloud console to create new, empty test projects with a variety of names, with new clusters using default GKE values. But this new problem persists with all of them.

When I use kubectl to create a deployment on GKE that uses any of the GCR images in the same project, I get ErrImagePull errors. When I 'describe' the pod that won't load the image, the error (with project id obscured) is:

Failed to pull image "gcr.io/test-xxxxxx/test:1.0.0": rpc error: code = Unknown desc = failed to pull and unpack image "gcr.io/test-xxxxxx/test:1.0.0": failed to resolve reference "gcr.io/test-xxxxxx/test:1.0.0": unexpected status code [manifests 1.0.0]: 401 Unauthorized.

This happens when I use kubectl from my laptop (including after wiping out and creating a new .kube/config file with proper credentials), but happens exactly the same when I use the cloud console to set up a deployment by choosing 'Deploy to GKE' for the GCR image... no kubectl involved.

If I ssh into a node in any of these new clusters and try to 'docker pull' a GCR image (in the same project), I get a similar error:

Error response from daemon: unauthorized: You don't have the needed permissions to perform this operation, and you may have invalid credentials. To authenticate your request, follow the steps in: https://cloud.google.com/container-registry/docs/advanced-authentication

My understanding from numerous articles is that no special authorization needs to be set up for GKE to pull GCR images from within the same project, and I've NEVER had this issue in the past.

I hope I'm not the only one on this deserted island. Thanks in advance for your help!

1
Hi there. What IAM roles does the SA of your GKE Node Pools have on your project? Your understanding is right, you do not need to do something extra, but proper roles are needed to the SA of the nodepool to access GCR. Are you using the default SA? or a custom SA?Armando Cuevas
Thanks for trying to help! I am very confused about how service accounts relate to node pools. I have never specified a custom SA. In each of the test projects I've set up, there is one SA called 'Computer Engine default service account'. When I look at 'Permissions' for it, under 'Members with access to this service account' I see 3 entries: it, 'Google APIs Service Agent', and me (the Owner). Is there somewhere else I should be looking? I would love to answer 'What IAM roles does the SA of your GKE Node Pools have on your project', but don't know where to lookuser2344885
I was just able to confirm that the node pools for my test clusters are all using the 'default' SA.user2344885
And my SA called 'Compute Engine default service account' has a role of Editor.user2344885

1 Answers

2
votes

I tried Implementing the setup and faced the same error both on the GKE Cluster and the Cluster’s nodes. This was caused because the access to Cloud Storage API is “Disabled” on the Cluster Nodes which can be verified from Node (VM instance) details under the “Cloud API access scopes” section.

We can rectify this by changing the “Access scopes” to “Set access for each API” and modify access to specific API in the Node Pools -> default-pool -> Security section when creating the cluster. In our case we need at least "Read Only" access for Storage API to enable access to Cloud Storage where the Image is stored. Changing the service account and access scopes for an instance for more information.