I am new to Flink. How to know what can be the production cluster requirements for flink. And how to decide the job memory, task memory and task slots for each job execution in yarn cluster mode. For ex- I have to process around 600-700 million records each day using datastream as it's a real time data.
0
votes
1 Answers
0
votes
There's no one-size-fits-all answer to these questions; it depends. It depends on the sort of processing you are doing with these events, whether or not you need to access external resources/services in order to process them, how much state you need to keep and the access and update patterns for that state, how frequently you will checkpoint, which state backend you choose, etc, etc. You'll need to do some experiments, and measure.
See How To Size Your Apache FlinkĀ® Cluster: A Back-of-the-Envelope Calculation for an in-depth introduction to this topic. https://www.youtube.com/watch?v=8l8dCKMMWkw is also helpful.