I put all services in a virtual machine with two cpus, and scale by cpu usage, there are two virtual machine at the busiest time, but most of the time there is only one.
First, if you have any availability requirements, I would recommend to always have at least two nodes. If you have only one node and that one crash (e.g. hardware failure or kernel panic) it will take some minutes before this is detected and it will take some minutes before a new node is up.
The inactive service requests cpu is set to 100m because it will not work well if it is less than 100m when it is busy.
I think the problem is that although these services require 100m of cpu to work properly, they are mostly idle.
The CPU request is a guaranteed reserved resource amount. Here you reserve too much resources for your almost idling services. Set the CPU request lower, maybe as low as 20m
or even 5m
? But since these services will need more resources during busy periods, set a higher limit so that the container can "burst" and also use Horizontal Pod Autoscaler for these. When using the Horizontal Pod Autoscaler more replicas will be created and the traffic will be load balanced across all replicas. Also see Managing Resources for Containers.
This is also true for your "busy services", reserve less CPU resources and use Horizontal Pod Autoscaling more actively so that the traffic is spread to more nodes during high load, but can scale down and save cost when the traffic is low.
I really hope that all services can autoscaling, I think this is the benefit of kubernetes, which can help me assign pods more flexibly. Is my idea wrong?
Yes, I agree with you.
Shouldn't I set a request cpu for an inactive service?
It is a good practice to always set some value for request and limit, at least for a production environment. The scheduling and autoscaling will not work well without resource requests.
If I have more active services, even in off-peak hours, the requests cpu will exceed 2000m. Is there any solution?
In general, try to use lower resource requests and use Horizontal Pod Autoscaling more actively. This is true for both your "busy services" and your "inactive services".
I find that kubernetes more often has more than two nodes.
Yes, there are two aspects of this.
If you only use two nodes, your environment probably is small and the Kubernetes control plane probably consists of more nodes and is the majority of the cost. For very small environments, Kubernetes may be expensive and it would be more attractive to use e.g. a serverless alternative like Google Cloud Run
Second, for availability. It is good to have at least two nodes in case of an abrupt crash e.g. hardware failure or a kernel panic, so that your "service" is still available meanwhile the node autoscaler scales up a new node. This is also true for the number of replicas for a Deployment
, if availability is important, use at least two replicas. When you e.g. drain a node for maintenance or node upgrade, the pods will be evicted - but not created on a different node first. The control plane will detect that the Deployment
(technically ReplicaSet) has less than the desired number of replicas and create a new pod. But when a new Pod is created on a new node, the container image will first be pulled before the Pod is running. To avoid downtime during these events, use at least two replicas for your Deployment
and Pod Topology Spread Constraints to make sure that those two replicas run on different nodes.
Note: You might run into the same problem as How to use K8S HPA and autoscaler when Pods normally need low CPU but periodically scale and that should be mitigated by an upcoming Kubernetes feature: KEP - Trimaran: Real Load Aware Scheduling
kubectl top pods -A
. Now you know how much they are consuming. 1 core = 1000m. So you can set a bit higher amount forrequests
fromkubectl top pods
output, but lower than 100m in some cases and highlimits
values. So pod resourcing will vary betweenrequests
(guaranteed) andlimits
(hard limit) range. Based on the output, you know where you can give less than 100m, if pod is not CPU intense at some moments in a day. You may can overbooklimits
if you plannedrequests
andlimits
well. – laimison