Getting CUDA out of memory

Question

Im trying to train a network but i get, I set my batch-size as 300 and i get this error,but even if i reduce this to 100 i still get this error,and more frustratingly for running 10 epoch on ~1200 images it takes about 40 minutes,any suggestions what is going wrong and how may i speed the process! Any tips will be extremely helpful,Thanks in advance.

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-31-3b43ff4eea72> in <module>()
      5         labels = Variable(labels).cuda()
      6 
----> 7         optimizer.zero_grad()
      8         outputs = cnn(images)
      9         loss = criterion(outputs, labels)

/usr/local/lib/python3.5/dist-packages/torch/optim/optimizer.py in     zero_grad(self)
    114                 if p.grad is not None:
    115                     if p.grad.volatile:
--> 116                         p.grad.data.zero_()
    117                     else:
    118                         data = p.grad.data

RuntimeError: cuda runtime error (2) : out of memory at /pytorch  /torch/lib/THC/generic/THCTensorMath.cu:35`

Even though my GPU's are free

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.111                Driver Version: 384.111                       |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:05:00.0 Off |                   N/A |
| 23%   18C    P8    15W / 250W |  10864MiB / 11172MiB |      0%      Default |
    +-------------------------------+----------------------+----------------------+

|   1  GeForce GTX 108...  Off  | 00000000:08:00.0 Off |                  N/A |    
 | 23%   20C    P8    15W / 250W |     10MiB / 11172MiB |          0%      Default
+-------------------------------+----------------------+---------------

Since you do not provide your code, people have to guess what is wrong. Why not try to provide a MCVE? — jdhao
As I don't know your code, the only suggestion I can give to you is to try reducing your batch size. — Rishabh Agrahari

prosti prosti · Accepted Answer · 2019-03-01T13:35:27

Fairly general question. Here is how I would think on this problem.

Try to set batch size (number of batches) to 1. If this fixed the problem you may try to find optimal batch size.

If even for bs=1 you get "RuntimeError: cuda runtime error (2) : out of memory" :

Do not use linear layers that are too large. A linear layer nn.Linear(m, n) uses O(nm)O(nm)O(nm) memory: that is to say, the memory requirements of the weights scales quadratically with the number of features considering also the gradients.
Do not accumulate history across your training loop. If you sum the loss recursively inside a loop 10000 or more your back-propagation evaluation will be huge; taking lot of memory.
Delete tensors you don't need with del explicitly.
Run ps -elf | grep python and python processes on your GPU kill -9 [pid] if you have doubts some other Python process is eating your memory.

Getting CUDA out of memory

1 Answers