1
votes

I'm trying to run Pytoch UNet from the following link on 2 or more GPUs

Pytorch-UNet github

the changes the I did till now is:

1. from:

net = UNet(n_channels=3, n_classes=1, bilinear=True)
logging.info(f'Network:\n'
             f'\t{net.module.n_channels} input channels\n'
             f'\t{net.module.n_classes} output channels (classes)\n'
             f'\t{"Bilinear" if net.module.bilinear else "Transposed conv"} upscaling')

to:

net = UNet(n_channels=3, n_classes=1, bilinear=True)
net = nn.DataParallel(net)
logging.info(f'Network:\n'
             f'\t{net.module.n_channels} input channels\n'
             f'\t{net.module.n_classes} output channels (classes)\n'
             f'\t{"Bilinear" if net.module.bilinear else "Transposed conv"} upscaling')

in each place where was:

net.<something>

replaced to:

net.module.<something>

I know that pytorch see more that 1 GPU because torch.cuda.device_count() return

2

.

But as long a I try to run train that need more momery than what the first GPU have I'm getting:

RuntimeError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 11.91 GiB total capacity; 10.51 GiB already allocated; 82.56 MiB free; 818.92 MiB cached)

I change required memory to train by changing the batch size. Any help welcome

EDIT

I see that training run twice faster with 2 GPUs, but max batch size of run with single GPU is the same as for two GPU. Is there any way to use the memory of 2 GPUs together during a single training run?

1
Lower the batch size until it fits to the GPU's memory that is the smallest amount.Rika
@Rika, I want to run on 2 GPU, with small batch I have no problem to run on oneChaosPredictor
Your GPUs must have the same amount of memory, if they have different amounts then Pytorch will use the smaller amount as the available amount of vram on Both GPUs. So lets say you have an 8Gig and a 12Gig gpu. when you want to train, your batchsize should be in a way t hat it does not exceed 8Gig. if it does, you can not train and it will fail with the error message you getRika
Thank you @Rika, I see the difference between running with 1 and 2 GPUs, I mean that with 2 it's running twice faster. But I can't understand why it's not using memory of the second GPU. I mean what should I do to use the memory of the second one, even 16GB based on your example.ChaosPredictor

1 Answers

2
votes

My mistake was changing output = net(input) (commonly named as model) to:

output = net.module(input)

you can find information here