I'm trying to figure out slot sharing and parallelism in Flink with the example WordCount.
Saying that I need to do the word count job with Flink, there are only one data source and only one sink.
In this case, can I make a design just like the image above? I mean, I set two sub-tasks on Source + map()
and two sub-tasks on keyBy()/window()/apply()
, in other words, I have two lines: A --- B --- Sink
and C --- D --- Sink
so that I can get a better performance.
For example, there is a data stream coming: aaa
, bbb
, aaa
. With the design above, I may get such a situation: aaa
and bbb
goes into the A --- B
and the other aaa
goes into the C --- D
. And finally, I can get the result aaa: 2, bbb: 1
at the Sink
. Am I right for now?
If I'm right, I know that subtasks of the same task cannot share a slot, so does it mean that A
and C
can't share a slot, B
and D
can't share a slot? Am I right? How do I assign the slots? Should I put A + B + Sink
into one slot and put C + D
into another slot?