0
votes

I wanted to build a Grafana dashboard to analyze failed canary releases in Flagger. Flagger provides a metric flagger_canary_status that shows the status of a canary. The status is encoded the following:

Value Meaning
0 Canary currenly running
1 Canary suceeded
2 Canary failed

So I would like to select into a Grafana variable the name of the apps that had a failed canary currently and potentially later within the currently shown range.

Using the query label_values(flagger_canary_status, name) it returns all label values for the metric (so I have a list of all the canary apps, not only the failed ones), but when I query like this: label_values(flagger_canary_status == 2 , name) it fails with the error "Error updating options: 1:23: parse error: unexpected <op:==>" though flagger_canary_status == 2 alone is a valid prometheus query.

screenshot

1

1 Answers

0
votes

I am now using the a recording rule to record a metric only when the canary status has the value 2. The avg_over_time is used, so the metric stays on that value 1h after canary was fixed (meaning the value went back to 1)

  groups:
    - name: "flagger-recording-rules"
      rules:
        - record: flagger_canary_with_problems_last_hour
          expr: avg_over_time( (flagger_canary_status == 2) [1h:10m])

The solution is not so flexible as if I just had the query in Grafana, but it works as intended