Help me please, I have a cluster Apache Flink (2 Job Managers, 3 Task Managers), but I don't know which values to set for that parameters in flink-conf.yml:
jobmanager.heap.size
taskmanager.heap.size
taskmanager.numberOfTaskSlots
parallelism.default
Job Manager machine has: 8CPU, 32GB RAM
Task Manager machine has: 8CPU, 32GB RAM
I'll plan to run on this cluster 15..20 Apache Flink Jobs. Due to private policy I can't write here java code, therefore I'll try to say in words.
- 1)I read data from Apache Kafka broker №1 (it is JSON messages)
- 2)Deserialization array of bytes in POJO
- 3)Using FilterFunction where I check some fields in POJO Event
- 4)Using KeyBy operator by id-field
- 5)Using KeyedProcessFunction with state(valueState or mapState) and timer (I am using HDFS RocksDB state backend)
- 6)Serialization POJO to array of bytes and sending to Apache Kafka broker №2
It is expected that more than 50 million events will come per day. All Jobs will have one data source.