I am trying to setup Horizontal Pod Autoscaler to automatically scale up and down my api server pods based on CPU usage.
I currently have 12 pods running for my API but they are using ~0% CPU.
kubectl get pods
NAME READY STATUS RESTARTS AGE
api-server-deployment-578f8d8649-4cbtc 2/2 Running 2 12h
api-server-deployment-578f8d8649-8cv77 2/2 Running 2 12h
api-server-deployment-578f8d8649-c8tv2 2/2 Running 1 12h
api-server-deployment-578f8d8649-d8c6r 2/2 Running 2 12h
api-server-deployment-578f8d8649-lvbgn 2/2 Running 1 12h
api-server-deployment-578f8d8649-lzjmj 2/2 Running 2 12h
api-server-deployment-578f8d8649-nztck 2/2 Running 1 12h
api-server-deployment-578f8d8649-q25xb 2/2 Running 2 12h
api-server-deployment-578f8d8649-tx75t 2/2 Running 1 12h
api-server-deployment-578f8d8649-wbzzh 2/2 Running 2 12h
api-server-deployment-578f8d8649-wtddv 2/2 Running 1 12h
api-server-deployment-578f8d8649-x95gq 2/2 Running 2 12h
model-server-deployment-76d466dffc-4g2nd 1/1 Running 0 23h
model-server-deployment-76d466dffc-9pqw5 1/1 Running 0 23h
model-server-deployment-76d466dffc-d29fx 1/1 Running 0 23h
model-server-deployment-76d466dffc-frrgn 1/1 Running 0 23h
model-server-deployment-76d466dffc-sfh45 1/1 Running 0 23h
model-server-deployment-76d466dffc-w2hqj 1/1 Running 0 23h
My api_hpa.yaml looks like:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server-deployment
minReplicas: 4
maxReplicas: 12
targetCPUUtilizationPercentage: 50
It has now been 24h and HPA has still not scaled down my pods to 4 even though the saw no CPU usage.
When I look at the GKE Deployment details dashboard I see the warning Unable to read all metrics
Is this causing autoscaler to not scale down my pods?
And how do I fix it?
It is my understanding that GKE runs a metrics server automatically:
kubectl get deployment --namespace=kube-system
NAME READY UP-TO-DATE AVAILABLE AGE
event-exporter-gke 1/1 1 1 18d
kube-dns 2/2 2 2 18d
kube-dns-autoscaler 1/1 1 1 18d
l7-default-backend 1/1 1 1 18d
metrics-server-v0.3.6 1/1 1 1 18d
stackdriver-metadata-agent-cluster-level 1/1 1 1 18d
Here is the configuration of that metrics server:
Name: metrics-server-v0.3.6
Namespace: kube-system
CreationTimestamp: Sun, 21 Feb 2021 11:20:55 -0800
Labels: addonmanager.kubernetes.io/mode=Reconcile
k8s-app=metrics-server
kubernetes.io/cluster-service=true
version=v0.3.6
Annotations: deployment.kubernetes.io/revision: 14
Selector: k8s-app=metrics-server,version=v0.3.6
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: k8s-app=metrics-server
version=v0.3.6
Annotations: seccomp.security.alpha.kubernetes.io/pod: docker/default
Service Account: metrics-server
Containers:
metrics-server:
Image: k8s.gcr.io/metrics-server-amd64:v0.3.6
Port: 443/TCP
Host Port: 0/TCP
Command:
/metrics-server
--metric-resolution=30s
--kubelet-port=10255
--deprecated-kubelet-completely-insecure=true
--kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
Limits:
cpu: 48m
memory: 95Mi
Requests:
cpu: 48m
memory: 95Mi
Environment: <none>
Mounts: <none>
metrics-server-nanny:
Image: gke.gcr.io/addon-resizer:1.8.10-gke.0
Port: <none>
Host Port: <none>
Command:
/pod_nanny
--config-dir=/etc/config
--cpu=40m
--extra-cpu=0.5m
--memory=35Mi
--extra-memory=4Mi
--threshold=5
--deployment=metrics-server-v0.3.6
--container=metrics-server
--poll-period=300000
--estimator=exponential
--scale-down-delay=24h
--minClusterSize=5
--use-metrics=true
Limits:
cpu: 100m
memory: 300Mi
Requests:
cpu: 5m
memory: 50Mi
Environment:
MY_POD_NAME: (v1:metadata.name)
MY_POD_NAMESPACE: (v1:metadata.namespace)
Mounts:
/etc/config from metrics-server-config-volume (rw)
Volumes:
metrics-server-config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: metrics-server-config
Optional: false
Priority Class Name: system-cluster-critical
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: <none>
NewReplicaSet: metrics-server-v0.3.6-787886f769 (1/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 3m10s (x2 over 5m39s) deployment-controller Scaled up replica set metrics-server-v0.3.6-7c9d64c44 to 1
Normal ScalingReplicaSet 2m54s (x2 over 5m23s) deployment-controller Scaled down replica set metrics-server-v0.3.6-787886f769 to 0
Normal ScalingReplicaSet 2m50s (x2 over 4m49s) deployment-controller Scaled up replica set metrics-server-v0.3.6-787886f769 to 1
Normal ScalingReplicaSet 2m33s (x2 over 4m34s) deployment-controller Scaled down replica set metrics-server-v0.3.6-7c9d64c44 to 0
Edit: 2021-03-13
This is the configuration for the api server deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server-deployment
spec:
replicas: 12
selector:
matchLabels:
app: api-server
template:
metadata:
labels:
app: api-server
spec:
serviceAccountName: api-kubernetes-service-account
nodeSelector:
#<labelname>:value
cloud.google.com/gke-nodepool: api-nodepool
containers:
- name: api-server
image: gcr.io/questions-279902/taskserver:latest
imagePullPolicy: "Always"
ports:
- containerPort: 80
#- containerPort: 443
args:
- --disable_https
- --db_ip_address=127.0.0.1
- --modelserver_address=http://10.128.0.18:8501 # kubectl get service model-service --output yaml
resources:
# You must specify requests for CPU to autoscale
# based on CPU utilization
requests:
cpu: "250m"
- name: cloud-sql-proxy
...