Finetuning Caffe Deep Learning Check failed: error == cudaSuccess out of memory

Question

I am relatively new in Deep learning and its framework. Currently, I am experimenting with Caffe framework and trying to fine tune the Vgg16_places_365.

I am using the Amazone EC2 instance g2.8xlarge with 4 GPUs (each has 4 GB of RAM). However, when I try to train my model (using a single GPU), I got this error:

Check failed: error == cudaSuccess (2 vs. 0) out of memory

After I did some research, I found that one of the ways to solve this out of memory problem is by reducing the batch size in my train.prototxt

Caffe | Check failed: error == cudaSuccess (2 vs. 0) out of memory.

Initially, I set the batch size into 50, and iteratively reduced it until 10 (since it worked when batch_size = 10). Now, the model is being trained and I am pretty sure it will take quite long time. However, as a newcomer in this domain, I am curious about the relation between this batch size and another parameter such as the learning rate, stepsize and even the max iteration that we specify in the solver.prototxt.

How significant the size of the batch will affect the quality of the model (like accuracy may be). How the other parameters can be used to leverage the quality. Also, instead of reducing the batch size or scale up my machine, is there another way to fix this problem?

David Stutz David Stutz · Accepted Answer · 2016-09-02T15:37:18

To answer your first question regarding the relationship between parameters such as batch size, learning rate and maximum number of iterations, you are best of reading about the mathematical background. A good place to start might be this stats.stackexchange question: How large should the batch size be for stochastic gradient descent?. The answer will briefly discuss the relation between batch size and learning rate (from your question I assume learning rate = stepsize) and also provide some references for further reading.

To answer your last question, with the dataset you are finetuning on and the model (i.e. the VGG16) being fixed (i.e. the input data of fixed size, and the model of fixed size), you will have a hard time avoiding the out of memory problem for large batch sizes. However, if you are willing to reduce the input size or the model size you might be able to use larger batch sizes. Depending on how (and what) exactly you are finetuning, reducing the model size may already be achieved by discarding learned layers or reducing the number/size of fully connected layers.

The remaining questions, i.e. how significant the batchsize influences quality/accuracy and how other parameters influence quality/accuracy, are hard to answer without knowing the concrete problem you are trying to solve. The influence of e.g. the batchsize on the achieved accuracy might depend on various factors such as the noise in your dataset, the dimensionality of your dataset, the size of your dataset as well as other parameters such as learning rate (=stepsize) or momentum parameter. For these sort of questions, I recommend the textbook by Goodfellow et al., e.g. chapter 11 may provide some general guidelines on choosing these hyperparmeters (i.e. batchsize, learning rate etc.).

Finetuning Caffe Deep Learning Check failed: error == cudaSuccess out of memory

2 Answers