2
votes

Edit: SOLVED- Problem relied on the number of workers, lowered them, problem solved

I am using a 24GB Titan RTX and I am using it for an image segmentation Unet with Pytorch,

it is always throwing Cuda out of Memory at different batch sizes, plus I have more free memory than it states that I need, and by lowering batch sizes, it INCREASES the memory it tries to allocate which doesn't make any sense.

here is what I tried:

Image size = 448, batch size = 8

  • "RuntimeError: CUDA error: out of memory"

Image size = 448, batch size = 6

  • "RuntimeError: CUDA out of memory. Tried to allocate 3.12 GiB (GPU 0; 24.00 GiB total capacity; 2.06 GiB already allocated; 19.66 GiB free; 2.31 GiB reserved in total by PyTorch)"

is says it tried to allocate 3.12GB and I have 19GB free and it throws an error??

Image size = 224, batch size = 8

  • "RuntimeError: CUDA out of memory. Tried to allocate 28.00 MiB (GPU 0; 24.00 GiB total capacity; 2.78 GiB already allocated; 19.15 GiB free; 2.82 GiB reserved in total by PyTorch)"

Image size = 224, batch size = 6

  • "RuntimeError: CUDA out of memory. Tried to allocate 344.00 MiB (GPU 0; 24.00 GiB total capacity; 2.30 GiB already allocated; 19.38 GiB free; 2.59 GiB reserved in total by PyTorch)"

reduced batch size but tried to allocate more ???

Image size = 224, batch size = 4

  • "RuntimeError: CUDA out of memory. Tried to allocate 482.00 MiB (GPU 0; 24.00 GiB total capacity; 2.21 GiB already allocated; 19.48 GiB free; 2.50 GiB reserved in total by PyTorch)"

Image size = 224, batch size = 2

  • "RuntimeError: CUDA out of memory. Tried to allocate 1.12 GiB (GPU 0; 24.00 GiB total capacity; 1.44 GiB already allocated; 19.88 GiB free; 2.10 GiB reserved in total by PyTorch)"

Image size = 224, batch size = 1

  • "RuntimeError: CUDA out of memory. Tried to allocate 1.91 GiB (GPU 0; 24.00 GiB total capacity; 894.36 MiB already allocated; 20.94 GiB free; 1.03 GiB reserved in total by PyTorch)"

Even with stupidly low image sizes and batch sizes...

1
You might want to consider adding your solution as an answer.iacob

1 Answers

0
votes

SOLVED- Problem relied on the number of workers, lowered them, problem solved