1
votes

I got several instances exposing a Prometheus counter and would like to aggregate all values over a certain period of time. I've been trying a lot of different things but can't get it working.

Let's assume my metric name is request_total. This metric has facets for path and status_code. My goal is to get an overall sum of this counter, without filtering it by any of its facets. If I run sum by (instance) (request_total), I get the following graph from Prometheus:

enter image description here

As we can see my counter seems to be correct for each instance. However, if I try to sum all those values with sum (request_total), I get the following result:

enter image description here

I'm pretty new to Prometheus but was expecting that the counter would actually not be reset and better cumulative. Could you please help me and tell me what I am missing here ?

Thanks in advance

1

1 Answers

1
votes

Yes sum(request_total) should work and show the result across all the instances, and according to your graphs that's exactly what it does:

until ~8:30am there are two instances that report 4 and 11 requests, total of 15 which you can see in the second graph.

from ~8:33am to 8:42am only one instance reports one request and then another instance starts reporting one request as well which shows as going from 1 to 2 on the second accumulative graph.