1
votes

I have to schedule jobs on a very busy GPU cluster. I don't really care about nodes, more about GPUs. The way my code is structured, each job can only use a single GPU at a time and then they communicate to use multiple GPUs. The way we generally schedule something like this is by doing gpus_per_task=1, ntasks_per_node=8, nodes=<number of GPUs you want / 8> since each node has 8 GPUs.

Since not everyone needs 8 GPUs, there are often nodes that have a few (<8) GPUs lying around, which using my parameters wouldn't be schedulable. Since I don't care about nodes, is there a way to tell slurm I want 32 tasks and I dont care how many nodes you use to do it?

For example if it wants to give me 2 tasks on one machine with 2 GPUs left and the remaining 30 split up between completely free nodes or anything else feasible to make better use of the cluster.

I know there's an ntasks parameter which may do this but the documentation is kind of confusing about it. It states

The default is one task per node, but note that the --cpus-per-task option will change this default.

What does cpus_per_task have to do with this?

I also saw

If used with the --ntasks option, the --ntasks option will take precedence and the --ntasks-per-node will be treated as a maximum count of tasks per node

but I'm also confused about this interaction. Does this mean if I ask for --ntasks=32 --ntasks-per-node=8 it will put at most 8 tasks on a single machine but it could put less if it decides to (basically this is what I want)

1

1 Answers

2
votes

Try --gpus-per-task 1 and --ntasks 32. No tasks per node or number of nodes specified. This allows slurm to distribute the tasks across the nodes however it wants and to use leftover GPUs on nodes that are not fully utilized. And it won't place more then 8 tasks on a single node, as there are no more then 8 GPUs available.

Regarding ntasks vs cpus-per-task: This should not matter in your case. Per default a task gets one CPU. If you use --cpus-per-tasks x it is guaranteed that the x CPUs are on one node. This is not the case if you just say --ntasks, where the tasks are spread however slurm decides. There is an example for this in the documentation.

Caveat: This requires a version of slurm >= 19.05, as all the --gpu options have been added there.