I have to schedule jobs on a very busy GPU cluster. I don't really care about nodes, more about GPUs. The way my code is structured, each job can only use a single GPU at a time and then they communicate to use multiple GPUs. The way we generally schedule something like this is by doing gpus_per_task=1
, ntasks_per_node=8
, nodes=<number of GPUs you want / 8>
since each node has 8 GPUs.
Since not everyone needs 8 GPUs, there are often nodes that have a few (<8) GPUs lying around, which using my parameters wouldn't be schedulable. Since I don't care about nodes, is there a way to tell slurm I want 32 tasks and I dont care how many nodes you use to do it?
For example if it wants to give me 2 tasks on one machine with 2 GPUs left and the remaining 30 split up between completely free nodes or anything else feasible to make better use of the cluster.
I know there's an ntasks
parameter which may do this but the documentation is kind of confusing about it. It states
The default is one task per node, but note that the --cpus-per-task option will change this default.
What does cpus_per_task
have to do with this?
I also saw
If used with the --ntasks option, the --ntasks option will take precedence and the --ntasks-per-node will be treated as a maximum count of tasks per node
but I'm also confused about this interaction. Does this mean if I ask for --ntasks=32
--ntasks-per-node=8
it will put at most 8 tasks on a single machine but it could put less if it decides to (basically this is what I want)