I have a PromQL query that is looking at max latency per quantile and displays the data in Grafana, but it shows data from a pod that is redeployed and no longer exists. The pod is younger than the staleness period of 15 days.
Here's the query: max(latency{quantile="..."})
The max latency found is from the time it was throttling, and shortly after it got redeployed and went back to normal, and now I want to look only at the max latency of what is currently live.
All the info that I found so far about staleness says it should be filtering behind the scenes, but doesn't look like it's happening in the current setup and I cannot figure out what should I change.
When adding manually in the query the specific instance ID - it works well, but the ID will change once it gets redeployed: max(latency{quantile="...", exported_instance="ID"})
Here is a long list of similar questions I found, some are not answered, some are not asking for the same. The ideas that I did find that are somewhat relevant but don't solve the problem in a sustainable way are:
Suggestions from the links below that were not helpful
- change staleness period, won't work because it affects the whole system
- restart Prometheus, won't work because it can't be done every time a pod is redeployed
- list each graph per machine, won't work with a
max
query
Links to similar questions
- How do I deal with old collected metrics in Prometheus? Switch prom->elk: log based monitoring
- Get data from prometheus only from last scrape iteration Staleness is a relevant concept, in Singlestat it shows how to use only current value
- Grafana dashboard showing deleted information from prometheus Default retention is 15 days, hide machines with a checkbox
- How can I delete old Jobs from Prometheus? Manual query/restart
- grafana variable still catch old metrics info Update prometheus targets
- Clear old data in Grafana Delete with prometheus settings
- https://community.grafana.com/t/prometheus-push-gateway/18835 Not answered
- https://www.robustperception.io/staleness-and-promql Explains how new staleness works without examples
The end goal
is displaying the max latency between all sources that are live now, dropping data from no longer existing sources.