1
votes

Can anybody help me to explain the meaning of this common problem in Pytorch?

Model: EfficientDet-D4

GPU: RTX 2080Ti

Batch size: 2

CUDA out of memory. Tried to allocate 14.00 MiB (GPU 0; 11.00 GiB total capacity; 8.32 GiB already allocated; 2.59 MiB free; 8.37 GiB reserved in total by PyTorch)

Anyway, I think the model and GPU are not important here and I know the solution should be reduced batch size, try to turn off the gradient while validating, etc. But I just want to know what is the meaning of 8.32 GiB while I have 11 GiB but can not allocate 14.00 MiB more?

Addition: I try to watch nvidia-smi while training with batch size = 1, it took 9.5 GiB in my GPU.

1
Can you check the processes which are utilizing your GPU memory? Is there any previous ghost process already occupying the memory in the GPU device?Anurag Reddy
I restart the kernel, kill all processes after each time I try, so I am sure that there is nothing in there (even try to restart my computer)CuCaRot
You can see the GPU usage with nvidia-smiTomer
@Tomer, I know that, but it just raised up to 10GiB then released all after the bug appeared, I will edit the questionCuCaRot
try torch.cuda.empty_cache() and tell us how it goes.Rika

1 Answers

1
votes

I have the answer from @ptrblck in the Pytorch community. In there, I described my question in more detail than this question.

Please check the answer in here .