1
votes

I am training my mobilenet v3 with tfrecords produced by tensorflow model. The training loss w.r.t steps is plotted below. Unit length in x axis is 20k steps (2 epochs approximately due to batch size=128 and 1281167 samples totally).

I exponential decay learning rate 0.01 every 3 epochs with staircase, and the loss falls normally in first 4 epochs. However, the loss rises and falls every epoch after 4-th epoch. I have tried momentum optimizer(painted orange) and rmsprop optimizer(painted blue), then get similar results. Please help me to troubleshoot this problem.

Each unit

1

1 Answers

1
votes

The periodicity is almost certainly aligned to 1 full epoch.

It's natural for your model to have a random variation in loss for different batches. You are seeing this random variation repeated over and over as the weights are stabilised so you just see (roughly) the same loss for each batch over and over with every epoch.

I'm not sure it needs troubleshooting but if you really want to avoid it you could shuffle your dataset between epochs