5
votes

I'm running Prometheus in a kubernetes cluster. All is running find and my UI pods are counting visitors.

enter image description here

Please ignore the title, what you see here is the query at the bottom of the image. It's a counter. The gaps in the graph are due to pods restarting. I have two pods running simultaneously!

Now suppose I would like to count the total of visitors, so I need to sum over all the pods

enter image description here

This is what I expect considering the first image, right?

However, I don't want the graph to drop when a pod restarts. I would like to have something cumulative over a specified amount of time (somehow ignoring pods restarting). Hope this makes any sense. Any suggestions?

UPDATE

Below is suggested to do the following

enter image description here

Its a bit hard to see because I've plotted everything there, but the suggested answer sum(rate(NumberOfVisitors[1h])) * 3600 is the continues green line there. What I don't understand now is the value of 3 it has? Also why does the value increase after 21:55, because I can see some values before that.

As the approach seems to be ok, I noticed that the actual increase is actually 3, going from 1 to 4. In the graph below I've used just one time series to reduce noise

enter image description here

1

1 Answers

2
votes

Rate, then sum, then multiply by the time range in seconds. That will handle rollovers on counters too.