1
votes

I'm having issues with Prometheus alerting rules. I have various cAdvisor specific alerts set up, for example:

- alert: ContainerCpuUsage
  expr: (sum(rate(container_cpu_usage_seconds_total[3m])) BY (instance, name) * 100) > 80
  for: 2m
  labels:
    severity: warning
  annotations:
    title: 'Container CPU usage (instance {{ $labels.instance }})'
    description: 'Container CPU usage is above 80%\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}'

When the condition is met, I can see the alert in the "Alerts" tab in Prometheus, however some labels are missing thus not allowing alertmanager to send a notification via Slack. To be specific, I attach custom "env" label to each target:

 {
  "targets": [
   "localhost:8080",
  ],
  "labels": {
   "job": "cadvisor",
   "env": "production",
   "__metrics_path__": "/metrics"
  }
 }

But when the alert based on cadvisor metrics is firing, the labels are: alertname, instance and severity - no job label, no env label. All the other alerts from other exporters (f.e. node-exporter) work just fine and the label is present.

1

1 Answers

2
votes

This is due to sum function that you use; it gathered all the time series present and added them groping BY (instance, name). If you run the same query in Prometheus, you'll see that sum left only grouping labels:

{instance="foo", name="bar"}    135.38819037447163

Other aggregation methods like avg, max, min, etc, work in the same fashion. To bring the label back simply add env to the grouping list: by (instance, name, env).