I have a GKE cluster (n1-standard-1, master version 1.13.6-gke.13) with 3 nodes on which I have 7 deployments, each running a Spring Boot application. A default Horizontal Pod Autoscaler was created for each deployment, with target CPU 80% and min 1 / max 5 replicas.
During normal operation, there is typically 1 pod per deployment and CPU usage at 1-5%. But when the application starts, e.g after performing a rolling update, the CPU usage spikes and the HPA scales up to max number of replicas reporting CPU usage at 500% or more.
When multiple deployments are started at the same time, e.g after a cluster upgrade, it often causes various pods to be unschedulable because it's out of CPU, and some pods are at "Preemting" state.
I have changed the HPAs to max 2 replicas since currently that's enough. But I will be adding more deployments in the future and it would be nice to know how to handle this correctly. I'm quite new to Kubernetes and GCP so I'm not sure how to approach this.
Here is the CPU chart for one of the containers after a cluster upgrade earlier today:
Everything runs in the default namespace and I haven't touched the default LimitRange with 100m default CPU request. Should I modify this and set limits? Given that the initialization is resource demanding, what would the proper limits be? Or do I need to upgrade the machine type with more CPU?
maxSurge: 1
andminReadySeconds: 60
so that it will only surge one pod per 60 seconds? This way while you have slower rollouts, you can do them without needing a handful of spare nodes. – eamon1234