0
votes

I have implemented this classification model using Mobilenet as the base model. when it is training, the training and validation accuracies and losses moving up and down after some epochs(noted from 34th epoch training accuracy started to move up and down then other accuracies also do same).according to my knowledge curves are looking fine, but values are going up-down after some epochs .Is this normal or do I need to change something ?

Epoch 1/50
    6539/6539 [==============================] - 3379s 516ms/step - loss: 2.9090 - accuracy: 0.3196 - top3_acc: 0.4849 - top5_acc: 0.5721 - val_loss: 1.7767 - val_accuracy: 0.5191 - val_top3_acc: 0.7397 - val_top5_acc: 0.8286
    Epoch 2/50
    6539/6539 [==============================] - 3342s 511ms/step - loss: 1.7218 - accuracy: 0.5261 - top3_acc: 0.7464 - top5_acc: 0.8385 - val_loss: 1.5645 - val_accuracy: 0.5651 - val_top3_acc: 0.7857 - val_top5_acc: 0.8669
    Epoch 3/50
    6539/6539 [==============================] - 3337s 510ms/step - loss: 1.5500 - accuracy: 0.5611 - top3_acc: 0.7853 - top5_acc: 0.8693 - val_loss: 1.4635 - val_accuracy: 0.5869 - val_top3_acc: 0.8064 - val_top5_acc: 0.8816
    Epoch 4/50
    6539/6539 [==============================] - 3343s 511ms/step - loss: 1.4469 - accuracy: 0.5859 - top3_acc: 0.8040 - top5_acc: 0.8854 - val_loss: 1.3982 - val_accuracy: 0.6012 - val_top3_acc: 0.8186 - val_top5_acc: 0.8919
    Epoch 5/50
    6539/6539 [==============================] - 3348s 512ms/step - loss: 1.3882 - accuracy: 0.5966 - top3_acc: 0.8153 - top5_acc: 0.8939 - val_loss: 1.3538 - val_accuracy: 0.6126 - val_top3_acc: 0.8260 - val_top5_acc: 0.8981
    Epoch 6/50
    6539/6539 [==============================] - 3340s 511ms/step - loss: 1.3382 - accuracy: 0.6123 - top3_acc: 0.8251 - top5_acc: 0.9011 - val_loss: 1.3192 - val_accuracy: 0.6192 - val_top3_acc: 0.8326 - val_top5_acc: 0.9033
    Epoch 7/50
    6539/6539 [==============================] - 3319s 508ms/step - loss: 1.3060 - accuracy: 0.6195 - top3_acc: 0.8323 - top5_acc: 0.9052 - val_loss: 1.2918 - val_accuracy: 0.6264 - val_top3_acc: 0.8359 - val_top5_acc: 0.9070
    Epoch 8/50
    6539/6539 [==============================] - 3314s 507ms/step - loss: 1.2744 - accuracy: 0.6249 - top3_acc: 0.8383 - top5_acc: 0.9106 - val_loss: 1.2693 - val_accuracy: 0.6312 - val_top3_acc: 0.8399 - val_top5_acc: 0.9106
    Epoch 9/50
    6539/6539 [==============================] - 3316s 507ms/step - loss: 1.2547 - accuracy: 0.6323 - top3_acc: 0.8419 - top5_acc: 0.9133 - val_loss: 1.2502 - val_accuracy: 0.6359 - val_top3_acc: 0.8430 - val_top5_acc: 0.9135
    Epoch 10/50
    6539/6539 [==============================] - 3313s 507ms/step - loss: 1.2271 - accuracy: 0.6375 - top3_acc: 0.8477 - top5_acc: 0.9166 - val_loss: 1.2339 - val_accuracy: 0.6400 - val_top3_acc: 0.8461 - val_top5_acc: 0.9157
    Epoch 11/50
    6539/6539 [==============================] - 3309s 506ms/step - loss: 1.2081 - accuracy: 0.6422 - top3_acc: 0.8503 - top5_acc: 0.9196 - val_loss: 1.2203 - val_accuracy: 0.6429 - val_top3_acc: 0.8489 - val_top5_acc: 0.9169
    Epoch 12/50
    6539/6539 [==============================] - 3315s 507ms/step - loss: 1.1863 - accuracy: 0.6477 - top3_acc: 0.8550 - top5_acc: 0.9216 - val_loss: 1.2080 - val_accuracy: 0.6473 - val_top3_acc: 0.8505 - val_top5_acc: 0.9188
    Epoch 13/50
    6539/6539 [==============================] - 3329s 509ms/step - loss: 1.1789 - accuracy: 0.6497 - top3_acc: 0.8568 - top5_acc: 0.9239 - val_loss: 1.1973 - val_accuracy: 0.6500 - val_top3_acc: 0.8522 - val_top5_acc: 0.9201
    Epoch 14/50
    6539/6539 [==============================] - 3325s 508ms/step - loss: 1.1618 - accuracy: 0.6535 - top3_acc: 0.8590 - top5_acc: 0.9254 - val_loss: 1.1870 - val_accuracy: 0.6523 - val_top3_acc: 0.8546 - val_top5_acc: 0.9215
    Epoch 15/50
    6539/6539 [==============================] - 3324s 508ms/step - loss: 1.1558 - accuracy: 0.6563 - top3_acc: 0.8617 - top5_acc: 0.9262 - val_loss: 1.1783 - val_accuracy: 0.6551 - val_top3_acc: 0.8555 - val_top5_acc: 0.9229
    Epoch 16/50
    6539/6539 [==============================] - 3325s 508ms/step - loss: 1.1380 - accuracy: 0.6618 - top3_acc: 0.8647 - top5_acc: 0.9281 - val_loss: 1.1698 - val_accuracy: 0.6573 - val_top3_acc: 0.8576 - val_top5_acc: 0.9235
    Epoch 17/50
    6539/6539 [==============================] - 3331s 509ms/step - loss: 1.1260 - accuracy: 0.6622 - top3_acc: 0.8662 - top5_acc: 0.9304 - val_loss: 1.1625 - val_accuracy: 0.6590 - val_top3_acc: 0.8593 - val_top5_acc: 0.9248
    Epoch 18/50
    6539/6539 [==============================] - 3327s 509ms/step - loss: 1.1204 - accuracy: 0.6658 - top3_acc: 0.8672 - top5_acc: 0.9299 - val_loss: 1.1569 - val_accuracy: 0.6605 - val_top3_acc: 0.8600 - val_top5_acc: 0.9260
    Epoch 19/50
    6539/6539 [==============================] - 3308s 506ms/step - loss: 1.1093 - accuracy: 0.6667 - top3_acc: 0.8698 - top5_acc: 0.9334 - val_loss: 1.1495 - val_accuracy: 0.6625 - val_top3_acc: 0.8616 - val_top5_acc: 0.9263
    Epoch 20/50
    6539/6539 [==============================] - 3320s 508ms/step - loss: 1.0955 - accuracy: 0.6710 - top3_acc: 0.8726 - top5_acc: 0.9342 - val_loss: 1.1438 - val_accuracy: 0.6660 - val_top3_acc: 0.8621 - val_top5_acc: 0.9274
    Epoch 21/50
    6539/6539 [==============================] - 3362s 514ms/step - loss: 1.0892 - accuracy: 0.6724 - top3_acc: 0.8733 - top5_acc: 0.9355 - val_loss: 1.1385 - val_accuracy: 0.6667 - val_top3_acc: 0.8631 - val_top5_acc: 0.9280
    Epoch 22/50
    6539/6539 [==============================] - 3371s 515ms/step - loss: 1.0852 - accuracy: 0.6733 - top3_acc: 0.8735 - top5_acc: 0.9358 - val_loss: 1.1330 - val_accuracy: 0.6678 - val_top3_acc: 0.8643 - val_top5_acc: 0.9290
    Epoch 23/50
    6539/6539 [==============================] - 3367s 515ms/step - loss: 1.0733 - accuracy: 0.6768 - top3_acc: 0.8753 - top5_acc: 0.9367 - val_loss: 1.1284 - val_accuracy: 0.6686 - val_top3_acc: 0.8647 - val_top5_acc: 0.9293
    Epoch 24/50
    6539/6539 [==============================] - 3362s 514ms/step - loss: 1.0718 - accuracy: 0.6779 - top3_acc: 0.8768 - top5_acc: 0.9375 - val_loss: 1.1240 - val_accuracy: 0.6706 - val_top3_acc: 0.8663 - val_top5_acc: 0.9296
    Epoch 25/50
    6539/6539 [==============================] - 3374s 516ms/step - loss: 1.0589 - accuracy: 0.6805 - top3_acc: 0.8786 - top5_acc: 0.9392 - val_loss: 1.1198 - val_accuracy: 0.6712 - val_top3_acc: 0.8661 - val_top5_acc: 0.9300
    Epoch 26/50
    6539/6539 [==============================] - 3370s 515ms/step - loss: 1.0527 - accuracy: 0.6829 - top3_acc: 0.8786 - top5_acc: 0.9384 - val_loss: 1.1157 - val_accuracy: 0.6721 - val_top3_acc: 0.8669 - val_top5_acc: 0.9303
    Epoch 27/50
    6539/6539 [==============================] - 3349s 512ms/step - loss: 1.0490 - accuracy: 0.6837 - top3_acc: 0.8810 - top5_acc: 0.9391 - val_loss: 1.1118 - val_accuracy: 0.6727 - val_top3_acc: 0.8682 - val_top5_acc: 0.9307
    Epoch 28/50
    6539/6539 [==============================] - 3362s 514ms/step - loss: 1.0460 - accuracy: 0.6849 - top3_acc: 0.8800 - top5_acc: 0.9401 - val_loss: 1.1081 - val_accuracy: 0.6741 - val_top3_acc: 0.8689 - val_top5_acc: 0.9312
    Epoch 29/50
    6539/6539 [==============================] - 3357s 513ms/step - loss: 1.0361 - accuracy: 0.6883 - top3_acc: 0.8819 - top5_acc: 0.9405 - val_loss: 1.1048 - val_accuracy: 0.6751 - val_top3_acc: 0.8696 - val_top5_acc: 0.9318
    Epoch 30/50
    6539/6539 [==============================] - 3344s 511ms/step - loss: 1.0273 - accuracy: 0.6890 - top3_acc: 0.8842 - top5_acc: 0.9421 - val_loss: 1.1023 - val_accuracy: 0.6748 - val_top3_acc: 0.8703 - val_top5_acc: 0.9322
    Epoch 31/50
    6539/6539 [==============================] - 3352s 513ms/step - loss: 1.0210 - accuracy: 0.6911 - top3_acc: 0.8849 - top5_acc: 0.9438 - val_loss: 1.0996 - val_accuracy: 0.6758 - val_top3_acc: 0.8708 - val_top5_acc: 0.9324
    Epoch 32/50
    6539/6539 [==============================] - 3351s 512ms/step - loss: 1.0183 - accuracy: 0.6930 - top3_acc: 0.8861 - top5_acc: 0.9434 - val_loss: 1.0964 - val_accuracy: 0.6776 - val_top3_acc: 0.8711 - val_top5_acc: 0.9328
    Epoch 33/50
    6539/6539 [==============================] - 3334s 510ms/step - loss: 1.0110 - accuracy: 0.6955 - top3_acc: 0.8873 - top5_acc: 0.9432 - val_loss: 1.0939 - val_accuracy: 0.6780 - val_top3_acc: 0.8723 - val_top5_acc: 0.9334
    Epoch 34/50
    6539/6539 [==============================] - 3329s 509ms/step - loss: 1.0023 - accuracy: 0.6967 - top3_acc: 0.8886 - top5_acc: 0.9451 - val_loss: 1.0910 - val_accuracy: 0.6781 - val_top3_acc: 0.8727 - val_top5_acc: 0.9338
    Epoch 35/50
    6539/6539 [==============================] - 3322s 508ms/step - loss: 1.0021 - accuracy: 0.6966 - top3_acc: 0.8891 - top5_acc: 0.9447 - val_loss: 1.0885 - val_accuracy: 0.6785 - val_top3_acc: 0.8730 - val_top5_acc: 0.9342
    Epoch 36/50
    6539/6539 [==============================] - 3323s 508ms/step - loss: 0.9939 - accuracy: 0.6987 - top3_acc: 0.8903 - top5_acc: 0.9462 - val_loss: 1.0864 - val_accuracy: 0.6792 - val_top3_acc: 0.8738 - val_top5_acc: 0.9341
    Epoch 37/50
    6539/6539 [==============================] - 3363s 514ms/step - loss: 0.9941 - accuracy: 0.6988 - top3_acc: 0.8900 - top5_acc: 0.9458 - val_loss: 1.0842 - val_accuracy: 0.6794 - val_top3_acc: 0.8739 - val_top5_acc: 0.9344
    Epoch 38/50
    6539/6539 [==============================] - 3337s 510ms/step - loss: 0.9916 - accuracy: 0.6987 - top3_acc: 0.8904 - top5_acc: 0.9463 - val_loss: 1.0823 - val_accuracy: 0.6804 - val_top3_acc: 0.8743 - val_top5_acc: 0.9347
    Epoch 39/50
    6539/6539 [==============================] - 3323s 508ms/step - loss: 0.9797 - accuracy: 0.7035 - top3_acc: 0.8933 - top5_acc: 0.9469 - val_loss: 1.0800 - val_accuracy: 0.6809 - val_top3_acc: 0.8754 - val_top5_acc: 0.9355
    Epoch 40/50
    6539/6539 [==============================] - 3327s 509ms/step - loss: 0.9802 - accuracy: 0.7013 - top3_acc: 0.8924 - top5_acc: 0.9472 - val_loss: 1.0781 - val_accuracy: 0.6813 - val_top3_acc: 0.8748 - val_top5_acc: 0.9354
    Epoch 41/50
    6539/6539 [==============================] - 3352s 513ms/step - loss: 0.9724 - accuracy: 0.7032 - top3_acc: 0.8939 - top5_acc: 0.9484 - val_loss: 1.0758 - val_accuracy: 0.6819 - val_top3_acc: 0.8757 - val_top5_acc: 0.9354
    Epoch 42/50
    6539/6539 [==============================] - 3343s 511ms/step - loss: 0.9687 - accuracy: 0.7070 - top3_acc: 0.8945 - top5_acc: 0.9493 - val_loss: 1.0746 - val_accuracy: 0.6816 - val_top3_acc: 0.8755 - val_top5_acc: 0.9356
    Epoch 43/50
    6539/6539 [==============================] - 3354s 513ms/step - loss: 0.9641 - accuracy: 0.7090 - top3_acc: 0.8952 - top5_acc: 0.9489 - val_loss: 1.0723 - val_accuracy: 0.6826 - val_top3_acc: 0.8765 - val_top5_acc: 0.9359
    Epoch 44/50
    6539/6539 [==============================] - 3356s 513ms/step - loss: 0.9630 - accuracy: 0.7070 - top3_acc: 0.8963 - top5_acc: 0.9491 - val_loss: 1.0709 - val_accuracy: 0.6827 - val_top3_acc: 0.8765 - val_top5_acc: 0.9363
    Epoch 45/50
    6539/6539 [==============================] - 3346s 512ms/step - loss: 0.9561 - accuracy: 0.7091 - top3_acc: 0.8973 - top5_acc: 0.9499 - val_loss: 1.0694 - val_accuracy: 0.6831 - val_top3_acc: 0.8769 - val_top5_acc: 0.9363
    Epoch 46/50
    2189/6539 [=========>....................] - ETA: 33:10 - loss: 0.9623 - accuracy: 0.7072 - top3_acc: 0.8963 - top5_acc: 0.9485dcs2016csc007@hpc2:~$ 0.8939

Loss curves

loss curves

Training accuracy curves training accuracy curves

Validation accuracy curves validation accuracy curves

2

2 Answers

1
votes

Reason is Learning Rate is high. You need to reduce learning rate further after that point.

Even if you are using low LR, you need to reduce it further. Have a look to example in this page https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/LearningRateScheduler

1
votes

Rather than set a lr schedule it may work out better you use an adjustable learning rate based on monitoring validation loss. The keras callback ReduceLROnPlateau makes that easy to do. Documentation is here. I also recommend you use the keras EarlyStopping callback. Documentation is here. Set up both to monitor validation loss. My recommended code is shown below

rlronp=tf.keras.callbacks.ReduceLROnPlateau(monitor="val_loss",factor=0.5, patience=1, 
                                          verbose=1)
estop=tf.keras.callbacks.EarlyStopping(monitor="val_loss",patience=4,verbose=1,
                                       restore_best_weights=True)
callbacks=[rlronp, estop]

It is normal for validation loss to oscillate around a certain level in the later epochs. Reducing the learning rate helps to get to a lower value. Think of the shape of the validation loss as a parabala in N space where N is the number of trainable parameters. In the conceptual image here in the later epochs validation loss decreases until a point is reached where the lr is to large and the loss begins to oscillate around some level. Reducing the lr will enable getting to a lower level. However at some point your model essential starts running on essentially noise so the oscillations will begin again or alternatively value loss may start to rise if the model begins to over fit. That is the advantage of using early stopping with restore_best_weights=True because when training is complete your model has the weights for the epoch with the lowest validation loss.