We have a pipeline with operations, split into 2 workloads - Source -> Transform
are in a first group and are CPU-intensive workloads, they are put into the same slot sharing group, lets say source
. And Sink
, RAM-intensive workload, as it uses Bulk upload and holds amount of data in memory. It's sent to sink
slot sharing group.
Additionally, we have a different parallelism level of Source -> Transform
workload and Sink
workload as the first one is limited by source parallelism. So, for example, we have Source -> Transform
parallelism of 50, meanwhile Sink
parallelism equal to 78. And we have 8 TMs, each with 16 cores (and therefore slots).
In this case, the ideal slots allocation strategy for us seems to be allocating 6-7 slots on each TM for Source -> Transform
, and the rest - for Sink
leading CPU-RAM workloads to be roughly evenly distributed across all TMs.
So, I wonder whether there is some config setting which will tell to distribute slot sharing groups evenly ?
I only found cluster.evenly-spread-out-slots config parameter, but I'm not sure whether it actually evenly distributes slot sharing groups, not only slots - for example, I get TMs with 10 Source -> Transform
tasks meanwhile I would expect 6 or 7.
So, the question is whether it is possible to tell Flink to dsitribute slot sharing groups evenly across cluster ? Or probably there is any other possibility to do it ?
Distribute a Flink operator evenly across taskmanagers seems a bit similar to my question, but I'm mostly asking about slot sharing groups distribution. This topic also contains only suggestion of using cluster.evenly-spread-out-slots but probably something has changed since then.