0
votes

What happens when there is an Exception thrown from the jar application to the Task Manager while processing an event?

a) Flink Job Manager will kill the existing task manager and create a new task manager?

b) Task manager itself recovers from the failed execution and restart process using local state saved in RocksDB?

java.lang.IllegalArgumentException: "Application error-stack trace"

I have a doubt that if that same kind erroneous events are getting processed by each of the task manager available hence they all get killed and entire flink job is down.

I am noticing that if some application error comes then eventually entire job will get down.

Don't figured out the exact reason as of now.

1

1 Answers

1
votes

In general, the exception in the Job should not cause the whole Task Manager to go down. We are talking about "normal" exceptions here. In such case the Job itself will fail and the Task Manager will try to restart it or not depending on the provided restart strategy.

Obviously, if for some reason Your Task Manager will die, for example due to the timeouts or something else. Then it will not be restarted automatically if You do not use some resource manager or orchestration tool like YARN or Kubernetes. The job in such case should be started after there are slots available.

As for the behaviour that You have described that the Job itself is "going down" I assume here that the job is simply going to FAILED state. This is due to the fact that different restart strategies have different thresholds for max number of retries and If the job will not work after the specified number of restarts it will simply go to failed state.