0
votes

I am training a model for semantic segmentation. I am using a batch size of 10 images for training on a single GPU. I am simultaneously using the same hyper-parameters for training on a multi-GPU (3 GPUs) setup. For multi-GPU, I am using a batch size of 30 images, i.e., 10 images per GPU.

Theoretically, should the loss values per step in each epoch during training be the same range of values for both the single GPU and multi-GPU training procedures?

In my case, it is not what I am currently seeing during training. The loss of the multi-GPU is 5 times larger than the value of the loss I am getting from the single-GPU.

Any input/suggestion is welcome.

1

1 Answers

0
votes

Loss is dependent on the batch size and elements in the batch. With 30 random images, the chance of the loss being higher (especially at the beginning of training) is much higher.

When you have 10 of those, there are less examples to "get wrong" by your neural network.

On the other hand, if multi-GPU and single-GPU would have the same amount of examples per batch, and each batch would be exactly the same and network parameters would be equal, loss should be equal as well (this could be done by fixing seed, batch size, and weights for both cases).