I'm experiencing a strange issue when using CPU Requests/Limits in Kubernetes. Prior to setting any CPU Requests/Limits at all, all my services performed very well. I recently started placing some Resource Quotas to avoid future resource starvation. These values were set based in the actual usage of those services, but to my surprise, after those were added, some services started to increase their response time drastically. My first guess was that I might placed wrong Requests/Limits, but looking at the metrics revealed that in fact none of the services facing this issue were near those values. In fact, some of them were closer to the Requests than the Limits.
Then I started looking at CPU throttling metrics and found that all my pods are being throttled. I then increased the limits for one of the services to 1000m (from 250m) and I saw less throttling in that pod, but I don't understand why I should set that higher limit if the pod wasn't reaching its old limit (250m).
So my question is: If I'm not reaching the CPU limits, why are my pods throttling? Why is my response time increasing if the pods are not using their full capacity?
Here there are some screenshots of my metrics (CPU Request: 50m, CPU Limit: 250m):
CPU Usage (here we can see the CPU of this pod never reached its limit of 250m):
After setting limits to this pod to 1000m, we can observe less throttling
kubectl top
P.S: Before setting these Requests/Limits there wasn't throttling at all (as expected)
P.S 2: None of my nodes are facing high usage. In fact, none of them are using more than 50% of CPU at any time.
Thanks in advance!