2
votes

Is there a way to oversubscribe GPUs on Slurm, i.e. run multiple jobs/job steps that share one GPU? We've only found ways to oversubscribe CPUs and memory, but not GPUs.

We want to run multiple job steps on the same GPU in parallel and optionally specify the GPU memory used for each step.

1
Check the answer given in: stackoverflow.com/questions/45200926/…Bub Espinja
@BubEspinja That answer seems to be the opposite direction of what we're aiming for. We don't want to use the GPUs from several nodes for a single job but a single GPU for multiple jobs. Or are you referring to the GPU virtualization mentioned in the cited paper and saying that that might be the only real solution?klicperajo
As far as I know, in Slurm you cannot share a GPU among several different jobs. For this reason, GPU virtualization is presented as a solution.Bub Espinja

1 Answers

2
votes

The easiest way of doing that is to have the GPU defined as a feature rather than as a gres so Slurm will not manage the GPUs, just make sure that job that need one land on nodes that offer one.