1
votes

I'm setting up a SLURM cluster with two 'physical' nodes. Each of the two nodes has two GPUs.

I would like to give the option to use only one of the GPUs (and have the other GPU still available for computation). I managed to set-up something with gres, but I later realized that even if only 1 of the GPUs is used the node will be occupied and the other GPU can not be used.

Is there a way to set the GPUs as the consumables and have two 'nodes' within a single node? And to assign a limited number of CPUs and memory to each?

1
There is this paper: ieeexplore.ieee.org/document/6970680DGIB
I'm not the sysadmin and, unfortunately, I cannot tell you how to do this. But I can tell you that it is possible. In the cluster I usually work, we have a bunch of nodes with 4 GPUs each, and you can ask how many you need to use (and not blocking the entire node). It is done via gres.Poshi

1 Answers

0
votes

I've had the same problem and I managed to make it work by allowing oversubscribing.

Here's the documentation about it: https://slurm.schedmd.com/cons_res_share.html

Not sure if what I did was exactly right, but I've put SelectType=select/cons_tres, SelectTypeParameters=CR_Core and put OverSubscribe=FORCE for my partition. Now I can launch several GPU jobs on the same node.