3
votes

Is there a mechanism in Flink to send alerts/notifications when a job has failed?

I was thinking maybe if a restart strategy is applied the job will be aware that it is being restarted and client code can send notification to some sink, but couldn't find any relevant job context info

1

1 Answers

1
votes

I'm not aware of a super-easy way to do this. A couple of ideas:

(1) The jobmanager is aware of failed jobs. You could poll /joboverview/completed, for example, looking for newly failed jobs. /jobs/<jobid>/exceptions can be used to get more info (docs).

(2) The CheckpointedFunction interface has an initializeState() method that is passed a context object that responds to an isRestored() method (docs). This is more-or-less the relevant job context you were looking for.