34
votes

I want to calculate the cpu usage of all pods in a kubernetes cluster. I found two metrics in prometheus may be useful:

container_cpu_usage_seconds_total: Cumulative cpu time consumed per cpu in seconds.
process_cpu_seconds_total: Total user and system CPU time spent in seconds.

Cpu Usage of all pods = increment per second of sum(container_cpu_usage_seconds_total{id="/"})/increment per second of sum(process_cpu_seconds_total)

However, I found every second's increment of container_cpu_usage{id="/"} larger than the increment of sum(process_cpu_seconds_total). So the usage may be larger than 1...

3

3 Answers

60
votes

This I'm using to get CPU usage at cluster level:

sum (rate (container_cpu_usage_seconds_total{id="/"}[1m])) / sum (machine_cpu_cores) * 100

I also track the CPU usage for each pod.

sum (rate (container_cpu_usage_seconds_total{image!=""}[1m])) by (pod_name)

I have a complete kubernetes-prometheus solution on GitHub, maybe can help you with more metrics: https://github.com/camilb/prometheus-kubernetes

enter image description here

enter image description here

7
votes

I created my own prometheus exporter (https://github.com/google-cloud-tools/kube-eagle), primarily to get a better overview of my resource utilization on a per node basis. But it also offers a more intuitive way monitoring your CPU and RAM resources. The query to get the cluster wide CPU usage would look like this:

sum(eagle_pod_container_resource_usage_cpu_cores)

But you can also easily get the CPU usage by namespace, node or nodepool.

0
votes

Well you can use below query as well:

avg (rate (container_cpu_usage_seconds_total{id="/"}[1m]))