14
votes

I run a v1.9.2 custom setup of Kubernetes and scrape various metrics with Prometheus v2.1.0. Among others, I scrape the kubelet and cAdvisor metrics.

I want to answer the question: "How much of the CPU resources defined by requests and limits in my deployment are actually used by a pod (and its containers) in terms of (milli)cores?"

There are a lot of scraped metrics available, but nothing like that. Maybe it could be calculated by the CPU usage time in seconds, but I don't know how.

I was considering it's not possible - until a friend told me she runs Heapster in her cluster which has a graph in the built-in Grafana that tells exactly that: It shows the indivual CPU usage of a pod and its containers in (milli)cores.

Since Heapster also uses kubelet and cAdvisor metrics, I wonder: how can I calculate the same? The metric in InfluxDB is named cpu/usage_rate but even with Heapster's code, I couldn't figure out how they calculate it.

Any help is appreciated, thanks!

1

1 Answers

21
votes

We're using the container_cpu_usage_seconds_total metric to calculate Pod CPU usage. This metrics contains the total amount of CPU seconds consumed by container by core (this is important, as a Pod may consist of multiple containers, each of which can be scheduled across multiple cores; however, the metric has a pod_name annotation that we can use for aggregation). Of special interest is the change rate of that metric (which can be calculated with PromQL's rate() function). If it increases by 1 within one second, the Pod consumes 1 CPU core (or 1000 milli-cores) in that second.

The following PromQL query does just that: Compute the CPU usage of all Pods (using the sum(...) by (pod_name) operation) over a five minute average:

sum(rate(container_cpu_usage_seconds_total[5m])) by (pod_name)