0
votes

Help me please, I have a cluster Apache Flink (2 Job Managers, 3 Task Managers), but I don't know which values to set for that parameters in flink-conf.yml:

jobmanager.heap.size

taskmanager.heap.size

taskmanager.numberOfTaskSlots

parallelism.default

Job Manager machine has: 8CPU, 32GB RAM
Task Manager machine has: 8CPU, 32GB RAM

I'll plan to run on this cluster 15..20 Apache Flink Jobs. Due to private policy I can't write here java code, therefore I'll try to say in words.

  • 1)I read data from Apache Kafka broker №1 (it is JSON messages)
  • 2)Deserialization array of bytes in POJO
  • 3)Using FilterFunction where I check some fields in POJO Event
  • 4)Using KeyBy operator by id-field
  • 5)Using KeyedProcessFunction with state(valueState or mapState) and timer (I am using HDFS RocksDB state backend)
  • 6)Serialization POJO to array of bytes and sending to Apache Kafka broker №2

It is expected that more than 50 million events will come per day. All Jobs will have one data source.

1

1 Answers

0
votes

I would consider to use a resource manager to like YARN, Mesos, or Kubernetes in order to have high availability. In a nutshell, this is what they do for you:

When deploying a Flink application, Flink automatically identifies the required resources based on the application’s configured parallelism and requests them from the resource manager. In case of a failure, Flink replaces the failed container by requesting new resources. All communication to submit or control an application happens via REST calls. This eases the integration of Flink in many environments.

in other words, they can offer the resources from the cluster in demand to the link engine. and you will have less trouble to configure the parameters that you are looking for.