I am training the ssd_mobilenet_v1_coco network on my own custom classes using the object detection API for tensorflow.
I have used the CPU (i7-6700) and GPU (NVIDIA Quadro K620) to train:
Processor Batch size sec/step sec/image
K620 1 0,45 0,450
K620 10 2,22 0,222
i7-6700 1 0,66 0,660
i7-6700 24 9,3 0,388
However, the GPU is only about 70% faster than the CPU. I expected the GPU to be significantly faster. Is this performance adequate for my hardware or is there something wrong?