GKE cluster-autoscaler cannot scale up nodepool based on the nodeaffinity

Question

Prerequisites:

GKE of 1.14* or 1.15* latest stable
labeled node pools, created by Deployment manager
An application, which requires persistence volume in RWO mode
Each deployments of applications is differ, should be run at the same time with others, and in the 1 pod per 1 node state.
Each pod has no replicas, should support rolling updates (by helm).

Design:

Deployment manager template for cluster and node pools,

node pools are labeled, each node has the same label (after initial creating)

each new app deploying into new namespace, what allows to have unique service address,

each new release could be 'new install' or 'update existing', based on the node label (nodes labels could be changed by kubectl during install or update of the app)

Problem:

That is working normally if cluster is created from browser console interface. If cluster was created by GCP deployment, the error is (tested on the nginx template from k8s docs with node affinity, even without drive attached):

Warning  FailedScheduling   17s (x2 over 17s)  default-scheduler   0/2 nodes are available: 2 node(s) didn't match node selector.
  Normal   NotTriggerScaleUp  14s                cluster-autoscaler  pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 node(s) didn't match node selector

What is the problem? Deployment manager creates bad labels?

affinity used:
    #   affinity:
    #     nodeAffinity:
    #       requiredDuringSchedulingIgnoredDuringExecution:
    #         nodeSelectorTerms:
    #         - matchExpressions:
    #           - key: node/nodeisbusy
    #             operator: NotIn
    #             values:
    #             - busy

Please note that StackOverflow community is not a support site for your favorite provider you risk to get your questions removed from the community. Your issue is very specific to a GCP product and note related to coding. What you can do is file a public issue with Google cloud.google.com/support/docs/issue-trackers#trackers-list — Ernesto U
Can you also share how you created your GKE via Deployment Manager? This is most likely because your node-pools does not have any label — Dean Christian Armada

Vasanth Gopal Vasanth Gopal · Accepted Answer · 2020-07-24T07:08:27

GCP give two ways to control deployments restricting to a node-pool or a set of nodes.

Am explaining below the #1 approach - A combination of nodeselector and tolerations to achieve restrictions on deployments along with auto-scaling.

Here is an example:

Let us say a cluster cluster-x is available. Let us say it contains two node pools

project-a-node-pool - Configured to autoscale from 1 to 2 nodes.
project-b-node-pool - Configured to autoscale from 1 to 3 nodes.

Node Pool Labels

Each of the nodes in project-a-node-pool would contain the label. This is configured by default.

cloud.google.com/gke-nodepool: project-a-node-pool

Each of the nodes in project-b-node-pool would contain the label. This is configured by default.

cloud.google.com/gke-nodepool: project-b-node-pool

Node Pool Taints

Add Taints to each of the node pool. As an example command:

gcloud container node-pools create project-a-node-pool --cluster cluster-x
--node-taints project=a:NoExecute

gcloud container node-pools create project-b-node-pool --cluster cluster-x
--node-taints project=b:NoExecute

Snapshot of Taints configured for project-a-node-pool

Deployment Tolerations

Add to the deployment YAML file, the tolerations matching the taint.

tolerations:
- key: "project"
  operator: "Equal"
  value: "a" (or "b")
  effect: "NoExecute"

Test with deployments

Try to do new deployments and check whether each deployment is happening as per the taint / toleration pair. Deployments with toleration value a should go to project-a-node-pool. Deployments with toleration value b should go to project-b-node-pool. Once sufficient memory / cpu request is reached in either of the node pool, newer deployments should trigger auto-scale within the node pool.