We have a Flink cluster managed by different team. Cluster is shared between multiple jobs. So in any particular time any task manager is having slots running different jobs' operations.I have few question-
- Is this advisable to share cluster in prod with other jobs?
- If one job fails, it will kill task manager running threads of another job as well?
- If we have no other way and have to go with shared cluster, what is the best way to handle exception scenarios so that another job is not got killed when Task manager commit suicide with FATAL error?