When training deep learning model, I found that GPU is not fully utilise if I set the train and validate(test) batch size to be same, say 32, 64, ..., 512.
Then I check NVIDIA Titan X specifications:
- NVIDIA CUDA® Cores: 3584
- Memory: 12 GB GDDR5X
In order to reduce test time for CNN model, I want to increase the number of samples in a batch as large as possible. I tried:
- set number of samples per batch to 3584, cuda out of memrory error.
- set number of samples per batch to 2048, cuda out of memrory error.
- set number of samples per batch to 1024, works. but I am not sure whether GPU is fully utilised or not.
Question:
How to easily pick the number of samples per batch to fully utilize GPU on deep model forward operation?