2
votes

The goal is to train YOLO with multi-GPU. According to Darknet AlexeyAB, we should train YOLO with single GPU for 1000 iterations first, and then continue it with multi-GPU from saved weight (1000_iter.weigts). So, we don't need to change any parameters in .cfg file? Here is my .cfg when I trained my model with single GPU:

[net]
# Testing
batch=1
subdivisions=1
# Training
# batch=64
# subdivisions=16
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
burn_in=1000
max_batches = 500200
policy=steps
steps=400000,450000
scales=.1,.1

AlexyAB says: modify .cfg "if you get Nan". In my case, I'm not getting Nan, but my loss is fluctuating. Shouldn't we change anything when we continue training with multi-GPU? batch? subdivisions? learning_rate? burn_in? We just need to continue training with same configurations?

1

1 Answers

1
votes

You will need to change burn_in, max_batches and steps between the two cases, for example, if your final target is 500200, your first .cfg file should have this:

burn_in=100
max_batches = 50000
policy=steps
steps=40000,45000

and the second file like this:

burn_in=1000
max_batches = 500200
policy=steps
steps=400000,450000

You need only to change learning_rate if you get a Nan according to this, then you should divide learning_rate by the number of GPUs and multiply burn_in by the same number.