1
votes

I have a project A in which I have created a service account. I want to create a GKE in project B.

I followed the steps of service account impersonation listed here https://cloud.google.com/iam/docs/impersonating-service-accounts

in project A, the default-service-accounts of project B have roles/iam.serviceAccountTokenCreator and roles/iam.serviceAccountUser on the service account I created which is my-service-account

in project B, my-service-account has Kubernetes admin role

When I try to create, I end up with the error

Error: Error waiting for creating GKE NodePool: All cluster resources were brought up, but: only 0 nodes out of 1 have registered; cluster may be unhealthy.

I am using terraform to create this cluster and the service account being used by terraform has kubernetes admin and service account user role.

This is what it shows in the console GKE error

Edit:

I tried using Gcloud command line to create GKE

gcloud beta container --project "my-project" clusters create "test-gke-sa" --zone "us-west1-a" --no-enable-basic-auth --cluster-version "1.18.16-gke.502" --release-channel "regular" --machine-type "e2-standard-16" --image-type "COS" --disk-type "pd-standard" --disk-size "100" --metadata disable-legacy-endpoints=true --scopes "https://www.googleapis.com/auth/devstorage.read_only","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" --num-nodes "3" --enable-stackdriver-kubernetes --enable-private-nodes --master-ipv4-cidr "192.168.0.16/28" --enable-ip-alias --network "projects/infgprj-sbo-n-hostgs-gl-01/global/networks/my-network" --subnetwork "projects/my-network/regions/us-west1/subnetworks/my-subnetwork" --cluster-secondary-range-name "gke1-pods" --services-secondary-range-name "gke1-services" --default-max-pods-per-node "110" --no-enable-master-authorized-networks --addons HorizontalPodAutoscaling,HttpLoadBalancing,GcePersistentDiskCsiDriver --enable-autoupgrade --enable-autorepair --max-surge-upgrade 1 --max-unavailable-upgrade 0 --enable-shielded-nodes --shielded-secure-boot --node-locations "us-west1-a" --service-account="[email protected]"

Got the same errors. I see that the node-pool is created, but not nodes. (or atleast they are not attached to the node-pool?)

here are some more pics of errors

VM page node-pools

GKE page GKE

Solution: Finally, I figured, what was wrong. I had given token creator role to only default service accounts. It started working when I gave the same role to default service agents as well. So basically

role = "roles/iam.serviceAccountTokenCreator",
members = [
        "serviceAccount:{project-number}[email protected]",
        "serviceAccount:service-{project-number}@container-engine-robot.iam.gserviceaccount.com",
        "serviceAccount:service-{project-number}@compute-system.iam.gserviceaccount.com",
      ]
2
are you sure that is a service account issue? - guillaume blaquiere
I think so, because if I use the default service account, it gets created without an error. - Chandan G

2 Answers

0
votes

Just to confirm that it's a service account error and not something involving Terraform, I recommend that you:

A. impersonate Project A's service account and confirm that you are who you're trying to be with this command - gcloud auth list (the active account is the one with the star next to it), and then

B. try creating a cluster in Project B with gcloud container clusters create - here are the reference docs but you can also:

  1. go to Console > Kubernetes Engine
  2. click on "Create,"
  3. scroll down to the bottom of the form and click on the "COMMAND LINE" link to launch a modal that generates the syntax of the CLI command you'd want to run
  4. copy, paste, tweak to make it create only one node and what other basic settings you want to change...make sure it's specifying --project=project-B
  5. run the command

That will likely give you a more helpful error message. Or at least a different one, so, hurray?

0
votes

Usually the above error may be caused by following reasons

1] If Shared VPC, verify IAM permissions are correct.

2] Verify Auto generated Ingress Firewall Rules are created

  • Usually three firewall rules are created

    • gke-${cluster_name}-${random_char}-all : Firewall Rule for pod to pod communication

    • gke-${cluster_name}-${random_char}-master : Rule for Master to talk to Nodes

    • gke-${cluster_name}-${random_char}-vms : Node to Node communication

random char: Random Character

3] Check firewall rules for denial of egress.

By default GCP creates a firewall rule of allowing all egress. If the you delete the rule or denies all egress, then you must configure a firewall rule that allows egress on the master CIDR block via tcp ports 443, 10250. Private Cluster Firewall Rules Private Cluster Firewall Rules documents how to obtain the master CIDR block.

-If you enable other GKE Add-Ons you may require adding additional egress firewall rules.

4] Check DNS Configuration for communication to Google APIs.

Leverage Kubelet logs to check for any curl failed request. Ex: Unable to resolve host or Connection Timeout during kubelet installation. There may be a chance that dns configuration is incorrect (ex resolve Private Google API's or hitting public google APIs). A dig command or looking at 'etc/resolv.conf' for dns servers should confirm where requests are being routed to.