I am training my mobilenet v3 with tfrecords produced by tensorflow model. The training loss w.r.t steps is plotted below. Unit length in x axis is 20k steps (2 epochs approximately due to batch size=128 and 1281167 samples totally).
I exponential decay learning rate 0.01 every 3 epochs with staircase, and the loss falls normally in first 4 epochs. However, the loss rises and falls every epoch after 4-th epoch. I have tried momentum optimizer(painted orange) and rmsprop optimizer(painted blue), then get similar results. Please help me to troubleshoot this problem.