16
votes

I am training deep neural networks with a GPU. If I make samples too large, batches too large, or networks too deep, I get an out of memory error. In this case, it is sometimes possible to make smaller batches and still train.

Is it possible to calculate GPU size required for training and determine what batch size to choose beforehand?

UPDATE

If I print network summary, it displays number of "trainable parameters". Can't I estimate from this value? For example, take this, multiply by batch size, double for gradients etc?

3
from the link Salvador gave, you also need to calculate intermediate memory that hold the image and its transforms. The total is 3*4*( intermediate memory * num_image + traninable parameters)/1024**3 GBuser3226167
Ideally, I think it should be possible. Each layer should just give amount of memory it will hog by accepting input size and whether in training mode and the model class should just compute total memory by calling memory occupied by its individual layers. Just need these frameworks(pytorch/.tensorflow) to add this functionality.saurabheights

3 Answers

5
votes

No, it is not possible to do this automatically. So you need to go through a lot of trial and error to find appropriate size if you want your batch to be as much as possible.

Stanford's CNN class provides some guidance how to estimate the memory size, but all suggestions are related to CNN (not sure what do you train).

5
votes

PyTorch Lightning recently added a feature called "auto batch size", especially for this! It computes the max batch size that can fit into the memory of your GPU :)

More info can be found here.

Original PR: https://github.com/PyTorchLightning/pytorch-lightning/pull/1638

3
votes

I think Salvador here means that it is not possible to analytically compute the best suited batch size, however, as all things are in ML, it is just another hyperparameter, that can be added to your grid search to be computed automatically. Simply evaluate your model's loss or accuracy (however you measure performance) for the best and most stable (least variable) measure given several batch sizes, say some powers of 2, such as 64, 256, 1024, etc. Then keep use the best found batch size. Note that batch size can depend on your model's architecture, machine hardware, etc. For example, if you move your modeling from a local PC to some cloud compute engine (GCP, AWS, Azure,...), then the batch size which was too large for your PC's RAM becomes easily suitable for practically limitless RAM/CPU/GPU (mind the costs).