Assuming my Flink application has 3 components: Source, Map, and Sink. Due to some reasons (e.g. calling an API has very high latency), the sink needs to have very high parallelism (e.g. 20). Also assuming that Source and Map take very little CPU/IO. We know that the minimum available slots should be at least at large as the maximum parallelism of the application, in this case it is 20. There will be 2 ways to deploy this application:
- If I already have a Flink cluster, deploying this application will take up 20 slots. However, my Source and Map don't need much resources, so these 20 slots will be idle most of the time (waiting because sink has high latency for calling the API). In this case I am wasting resources.
- I can setup a per-job cluster for this application, and set the number of slots per taskmanager very high, to reduce the resource per slots. In this case, I will also need to set parallelism of my Map to a high value in order to get enough CPU capacity. However, because Map is CPU bounded, high parallelism will lead to performance degradation (threads context switch).
So my question is, what's the best practice in this case?
Previously I used Apache Storm. For Storm application, I need to specify worker number (slots), and parallelism of each operator. However, the available slots doesn't need to be at least at large as the maximum parallelism of the application, so for this application, I can set 2 workers, 2 parallelism for Source and Map, and 20 parallelism for Sink, and it will end up only take 2 slots, with each slot has 1 source, 1 Map, and 10 Sink bolts. I think in this way it not only satisfies the need to have high parallel sink, but also make good use of resources (only 2 Map). Why people want to design the Flink parallelism this way? Or my understanding is wrong?
DiscardingSink
so Flink is satisfied you have a complete workflow, but all of the interesting tuning/parallelism happens via an AsyncIO function prior to that sink. – kkrugler