3
votes

First I am running on Standalone mode!

I have been trying to find any configuration but I haven't found anything about this.

In Spark there are some configurations which let you limit the number of CPUs to use in each slave:

  • SPARK_WORKER_CORES (worker configurations)
  • spark.executor.cores (cluster configuration)

But in Flink you just can set the maximun memory to use and the number of task slots (which just divides the memory) as said in the official documentation:

  • taskmanager.numberOfTaskSlots: The number of parallel operator or user function instances that a single TaskManager can run (DEFAULT: 1). If this value is larger than 1, a single TaskManager takes multiple instances of a function or operator. That way, the TaskManager can utilize multiple CPU cores, but at the same time, the available memory is divided between the different operator or function instances. This value is typically proportional to the number of physical CPU cores that the TaskManager’s machine has (e.g., equal to the number of cores, or half the number of cores).

And here more focused on my question:

Each task slot represents a fixed subset of resources of the TaskManager. A TaskManager with three slots, for example, will dedicate 1/3 of its managed memory to each slot. Slotting the resources means that a subtask will not compete with subtasks from other jobs for managed memory, but instead has a certain amount of reserved managed memory. Note that no CPU isolation happens here; currently slots only separate the managed memory of tasks.

Thanks!!

2
Just set taskmanager.numberOfTaskSlots to the number of cores you want the TM to have. This parameter sets the maximum number of slots available for each TM. By default, each slot takes one core. - BrightFlow

2 Answers

0
votes

I was looking for the same question. In my understanding, there is no configuration that will set number of CPUs per slot. Setting the number of slots will divide the memory among the slots reducing the memory per slot. My best guess is set the number of slots to 1 and have CPUs available to the task manager process running in a container(may be docker). You can achieve the same parallelism by increasing the number taskmanagers.

0
votes

Think this is in the flink config documentation: https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#yarn

yarn.containers.vcores -1

The number of virtual cores (vcores) per YARN container. By default, the number of vcores is set to the number of slots per TaskManager, if set, or to 1, otherwise. In order for this parameter to be used your cluster must have CPU scheduling enabled. You can do this by setting the org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.

david is correct above -- but the reasoning is because of this setting and I think this more closely answers the OPs question. So if you leave the default value, adjusting the number of task slots will adjust the number of cores.