0
votes

I've read many docs about this issue but I'm still confused by the two concepts: slot and task.

Let's see the example of WordCount.

enter image description here

As my understanding, each yellow circle is an operator, and Flink can do some optimization, meaning that it can merge more than one operator into an operator chain. In this example, Source and map() can be merged so it becomes as below:

enter image description here

The whole stream becomes three tasks: Source + map(), KeyBy()/window()/apply() and Slink.

If I'm right,one slot is one thread in the TaskManager of Flink, so I'm confused now. In this example, we have three tasks, so does it mean that we must have three slots (each task has its own thread), or does it mean that we must create a TaskManager with three slots for this example? What if the TaskManager has only one or two slots? If we have less than three slots, some exception will be thrown?

1

1 Answers

0
votes

One Slot is not one thread. One slot can have multiple threads.

A Task can have multiple parallel instances which are called Sub-tasks. Each sub-task is ran in a separate thread.

Multiple sub-tasks from different tasks can come together and share a slot. This group of sub-tasks is called a slot-sharing group. Please note that two sub-tasks of the same task (parallel instances of the same task) can not share a slot together.

The number of slots in a Task Manager represents the maximum parallelism it can support. For example, if your job has a parallelism of one for each operator. It can be ran in a Task Manager with one slot. The reason is that all the sub-tasks share the same slot and belong to a slot-sharing group.

Let's consider another example. Assume that you have a Task Manager with one slot and the word count job with a parallelism of one for all operators except KeyBy()/window()/apply() which has 3. When you submit this job, it will fail because you had only one slot. One sub-task of KeyBy()/window()/apply() will be sharing a slot with sub-tasks of Source + Map and Sink. But the other two sub-tasks won't find a slot (because two subtasks of the same task cannot share a slot as mentioned earlier)