1
votes

The execution of Flinks programs has to be triggered, e.g. with execute(). Otherwise Flink only creates a new execution plan, right? My Question is: Which components of Flink are being activated when processing a lazy operation without triggering the execution?

According the dokumentation there is an optimizer responsible for building a dataflow graph. Are there more processes involved?

And is there a way to find out the id of the optimizer process in order to monitor it?

1

1 Answers

3
votes

Flink DataSet programs are optimized when the execution is triggered. Before, the program is only constructed by appending operators and data sinks to other operators and data sources.

The optimization happens within the client process before the program is submitted to the JobManager process. That means, there is no dedicated optimizer process that could be monitored.

The program translation is done in multiple steps:

  1. Program construction using the DataSet API
  2. Translation into the generic API
  3. Program optimization
  4. JobGraph generation

The JobGraph is the data flow representation that is scheduled by the JobManager for execution.