2
votes

I am trying to set up auto-provisioning on Google's Kubernetes service GKE. I created a cluster with both auto-scaling and auto-provisioning like so:

gcloud beta container clusters create "some-name" --zone "us-central1-a" \
  --no-enable-basic-auth --cluster-version "1.13.11-gke.14" \
  --machine-type "n1-standard-1" --image-type "COS" \
  --disk-type "pd-standard" --disk-size "100" \
  --metadata disable-legacy-endpoints=true \
  --scopes "https://www.googleapis.com/auth/devstorage.read_only","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" \
  --num-nodes "1" --enable-stackdriver-kubernetes --enable-ip-alias \
  --network "projects/default-project/global/networks/default" \
  --subnetwork "projects/default-project/regions/us-central1/subnetworks/default" \
  --default-max-pods-per-node "110" \
  --enable-autoscaling --min-nodes "0" --max-nodes "8" \
  --addons HorizontalPodAutoscaling,KubernetesDashboard \
  --enable-autoupgrade --enable-autorepair \
  --enable-autoprovisioning --min-cpu 1 --max-cpu 8 --min-memory 1 --max-memory 16

The cluster has 1 node pool with 1 node having 1 vCPU. I tried running a deployment which requests 4 vCPU, so it would clearly not be satisfied by the current node pool.

kubectl run say-lol --image ubuntu:18.04 --requests cpu=4 -- bash -c 'echo lolol'

Here is what I want to happen: The auto-scaler should fail to accommodate the new deployment, as the existing node pool doesn't have enough CPU. The auto-provisioner should try to create a new node pool with a new node of 4 vCPU to run the new deployment.

Here is what is happening: The auto-scaler fails as expected. But the auto-provisioner is doing nothing. The pod remains Pending indefinitely. No new node pools get created.

$ kubectl get events
LAST SEEN   TYPE      REASON              KIND         MESSAGE
50s         Warning   FailedScheduling    Pod          0/1 nodes are available: 1 Insufficient cpu.
4m7s        Normal    NotTriggerScaleUp   Pod          pod didn't trigger scale-up (it wouldn't fit if a new node is added): 1 Insufficient cpu
9m17s       Normal    SuccessfulCreate    ReplicaSet   Created pod: say-lol-5598b4f6dc-vz58k
9m17s       Normal    ScalingReplicaSet   Deployment   Scaled up replica set say-lol-5598b4f6dc to 1

$ kubectl get pod
NAME                       READY   STATUS    RESTARTS   AGE
say-lol-5598b4f6dc-vz58k   0/1     Pending   0          9m14s

$ kubectl get nodes
NAME                                       STATUS   ROLES    AGE   VERSION
gke-some-name-default-pool-4ec86782-bv5t   Ready    <none>   31m   v1.13.11-gke.14

Why isn't a new node pool getting created to run the new deployment?

EDIT: It seems the cpu=4 is the problematic part. If I change to cpu=1.5, it works. A new node pool is created and the pods start running. However, I indicated --max-cpu 8 so it should clearly be able to handle 4 vCPUs.

1

1 Answers

2
votes

Issue could be related to allocatable CPU. Please check the machine type that was created.

Specifying this --max-cpu 8 does not mean that new node will have 8 cores. Instead it specifies the maximum number of cores in the cluster.

Changing to --max-cpu 40 should give better results as it will allow for a bigger machine type to be created.