0
votes

I'm trying to monitor the availability of my flink jobs using Prometheus alerts.

I have tried with the flink_jobmanager_job_uptime/downtime metrics but they don't seem to fit since they just stop being emmited after the job has failed/finished. I have already been pointed out to the numRunningJobs metric in order to alert of a missing job. I don't want to use this solution since I would have to update my prometheus config each time i want to deploy a new job.

Has anyone managed to create this alert of a Flink failed job using Prometheus?

1

1 Answers

0
votes

Prometheus has an absent() function that will return 1 if the metric don't exist. So, you can just set the alert expression to something like

absent(flink_jobmanager_job_uptime) == 1