Explain CUDA out of memory in Pytorch

Question

Can anybody help me to explain the meaning of this common problem in Pytorch?

Model: EfficientDet-D4

GPU: RTX 2080Ti

Batch size: 2

CUDA out of memory. Tried to allocate 14.00 MiB (GPU 0; 11.00 GiB total capacity; 8.32 GiB already allocated; 2.59 MiB free; 8.37 GiB reserved in total by PyTorch)

Anyway, I think the model and GPU are not important here and I know the solution should be reduced batch size, try to turn off the gradient while validating, etc. But I just want to know what is the meaning of 8.32 GiB while I have 11 GiB but can not allocate 14.00 MiB more?

Addition: I try to watch nvidia-smi while training with batch size = 1, it took 9.5 GiB in my GPU.

Can you check the processes which are utilizing your GPU memory? Is there any previous ghost process already occupying the memory in the GPU device? — Anurag Reddy
I restart the kernel, kill all processes after each time I try, so I am sure that there is nothing in there (even try to restart my computer) — CuCaRot
@Tomer, I know that, but it just raised up to 10GiB then released all after the bug appeared, I will edit the question — CuCaRot

CuCaRot CuCaRot · Accepted Answer · 2020-10-27T11:09:54

I have the answer from @ptrblck in the Pytorch community. In there, I described my question in more detail than this question.

Please check the answer in here .

Explain CUDA out of memory in Pytorch

1 Answers