Multi-GPU training does not reduce training time

Question

I have tried training three UNet models using keras for image segmentation to assess the effect of multi-GPU training.

First model was trained using 1 batch size on 1 GPU (P100). Each training step took ~254ms. (Note it is step, not epoch).
Second model was trained using 2 batch size using 1 GPU (P100). Each training step took ~399ms.
Third model was trained using 2 batch size using 2 GPUs (P100). Each training step took ~370ms. Logically it should have taken the same time as the first case, since both GPUs process 1 batch in parallel but it took more time.

Anyone who can tell whether multi-GPU training results in reduced training time or not? For reference, I tried all the models using keras.

You should look at, given the same model initialization, the total convergence time. Otherwise there might be many doubts about "what is a step" for a multigpu model, and also "what is an epoch". — Daniel Möller
@DanielMöller : Could you please tell what do you mean by total convergence time? — samra irshad
Yes, the time the model take to reach what you expect from it. The answer Srihari put here seems to say something similar. — Daniel Möller

Timbus Calin Timbus Calin · Accepted Answer · 2020-03-24T10:55:33

I presume that this is due to the fact that you use a very small batch_size; in this case, the cost of distributing the gradients/computations over two GPUs and fetching them back (as well as CPU to GPU(2) data distribution) outweigh the parallel time advantage that you might gain versus the sequential training(on 1 GPU).

Expect to see a bigger difference for a batch size of 8/16 for instance.

Multi-GPU training does not reduce training time

1 Answers