I am surprised that increasing batch size does not increase the total processing speed on a GPU. My measurements:
- batch_size=1: 0.33 sec/step
- batch_size=2: 0.6 sec/step
- batch_size=3: 0.8 sec/step
- batch_size=4: 1.0 sec/step
My expectation was that the time for the step would remain (almost) constant thanks to parallelization on the GPU. However, it almost linearly scales with the batch size. Why? Did I misunderstood something?
I am using Tensorflow Object Detection API, retraining the pre-trained faster_rcnn_resnet101_coco model, the predefined batch_size is 1, our GPU (Nvidia 1080 Ti) could handle up to 4 images so I wanted to exploit this to accelerate the training.