6
votes

I was happily deploying to Kubernetes Engine for a while, but while working on an integrated cloud container builder pipeline, I started getting into trouble.

I don't know what changed. I can not deploy to kubernetes anymore, even in ways I did before without cloud builder.

The pods rollout process gives an error indicating that it is unable to pull from the registry. Which seems weird because the images exist (I can pull them using cli) and I granted all possibly related permissions to my user and the cloud builder service account.

I get the error ImagePullBackOff and see this in the pod events:

Failed to pull image "gcr.io/my-project/backend:f4711979-eaab-4de1-afd8-d2e37eaeb988": rpc error: code = Unknown desc = unauthorized: authentication required

What's going on? Who needs authorization, and for what?

2
If you are using the correct [:TAG|@DIGEST], and localized hostnames, assigned the cloud storage object viewer(try admin also) to your service account, and imported its secret, it could be that your cluster does not have the proper scopesFady
Is the GKE cluster also in my-project? If so, can you please log in to a node running the pod and run journalctl -f -u docker and copy paste the detailed error the docker-engine is seeing while pulling the image?Ahmet Alp Balkan

2 Answers

8
votes

In my case, my cluster didn't have the Storage read permission, which is necessary for GKE to pull an image from GCR.

My cluster didn't have proper permissions because I created the cluster through terraform and didn't include the node_config.oauth_scopes block. When creating a cluster through the console, the Storage read permission is added by default.

1
votes

The credentials in my project somehow got messed up. I solved the problem by re-initializing a few APIs including Kubernetes Engine, Deployment Manager and Container Builder.

First time I tried this I didn't succeed, because to disable something you have to disable first all the APIs that depend on it. If you do this via the GCloud web UI then you'll likely see a list of services that are not all available for disabling in the UI.

I learned that using the gcloud CLI you can list all APIs of your project and disable everything properly.

Things worked after that.

The reason I knew things were messed up, is because I had a copy of the same things as a production environment, and there these problems did not exist. The development environment had a lot of iterations and messing around with credentials, so somewhere things got corrupted.

These are some examples of useful commands:

gcloud projects get-iam-policy $PROJECT_ID

gcloud services disable container.googleapis.com --verbosity=debug

gcloud services enable container.googleapis.com

More info here, including how to restore service account credentials.