Don't show data from redeployed pod in Grafana using promQL

Question

I have a PromQL query that is looking at max latency per quantile and displays the data in Grafana, but it shows data from a pod that is redeployed and no longer exists. The pod is younger than the staleness period of 15 days.

Here's the query: max(latency{quantile="..."})

The max latency found is from the time it was throttling, and shortly after it got redeployed and went back to normal, and now I want to look only at the max latency of what is currently live.

All the info that I found so far about staleness says it should be filtering behind the scenes, but doesn't look like it's happening in the current setup and I cannot figure out what should I change.

When adding manually in the query the specific instance ID - it works well, but the ID will change once it gets redeployed: max(latency{quantile="...", exported_instance="ID"})

Here is a long list of similar questions I found, some are not answered, some are not asking for the same. The ideas that I did find that are somewhat relevant but don't solve the problem in a sustainable way are:

Suggestions from the links below that were not helpful

change staleness period, won't work because it affects the whole system
restart Prometheus, won't work because it can't be done every time a pod is redeployed
list each graph per machine, won't work with a max query

Links to similar questions

How do I deal with old collected metrics in Prometheus? Switch prom->elk: log based monitoring
Get data from prometheus only from last scrape iteration Staleness is a relevant concept, in Singlestat it shows how to use only current value
Grafana dashboard showing deleted information from prometheus Default retention is 15 days, hide machines with a checkbox
How can I delete old Jobs from Prometheus? Manual query/restart
grafana variable still catch old metrics info Update prometheus targets
Clear old data in Grafana Delete with prometheus settings
https://community.grafana.com/t/prometheus-push-gateway/18835 Not answered
https://www.robustperception.io/staleness-and-promql Explains how new staleness works without examples

The end goal

is displaying the max latency between all sources that are live now, dropping data from no longer existing sources.

This question seems to be confusing retention and staleness. Can you give example time series, and what output you want? — brian-brazil

Kamol Hasan Kamol Hasan · Accepted Answer · 2019-08-09T10:32:19

You can use auto generated metric named up to isolate your required metrics from others. You can easily determine which metric sources are offline from up metric.

up{job="", instance=""}: 1 if the instance is healthy, i.e. reachable, or 0 if the scrape failed.

Don't show data from redeployed pod in Grafana using promQL

Suggestions from the links below that were not helpful

Links to similar questions

The end goal

1 Answers