Validation loss and validation accuracy both are higher than training loss and acc and fluctuating

Question

I am trying to train my model using transfer learning, for this I am using VGG16 model, stripped the top layers and froze first 2 layers for using imagenet initial weights. For fine tuning them I am using learning rate 0.0001, activation softmax, dropout 0.5, loss categorical crossentropy, optimizer SGD, classes 46.

I am just unable to understand the behavior while training. Train loss and acc both are fine (loss is decreasing, acc is increasing). Val loss is decreasing and acc is increasing as well, BUT they are always higher than the train loss and acc.

Assuming its overfitting I made the model less complex, increased the dropout rate, added more samples to val data, but nothing seemed to work. I am a newbie so any kind of help is appreciated.

26137/26137 [==============================] - 7446s 285ms/step - loss: 1.1200 - accuracy: 0.3810 - val_loss: 3.1219 - val_accuracy: 0.4467
Epoch 2/50
26137/26137 [==============================] - 7435s 284ms/step - loss: 0.9944 - accuracy: 0.4353 - val_loss: 2.9348 - val_accuracy: 0.4694
Epoch 3/50
26137/26137 [==============================] - 7532s 288ms/step - loss: 0.9561 - accuracy: 0.4530 - val_loss: 1.6025 - val_accuracy: 0.4780
Epoch 4/50
26137/26137 [==============================] - 7436s 284ms/step - loss: 0.9343 - accuracy: 0.4631 - val_loss: 1.3032 - val_accuracy: 0.4860
Epoch 5/50
26137/26137 [==============================] - 7358s 282ms/step - loss: 0.9185 - accuracy: 0.4703 - val_loss: 1.4461 - val_accuracy: 0.4847
Epoch 6/50
26137/26137 [==============================] - 7396s 283ms/step - loss: 0.9083 - accuracy: 0.4748 - val_loss: 1.4093 - val_accuracy: 0.4908
Epoch 7/50
26137/26137 [==============================] - 7424s 284ms/step - loss: 0.8993 - accuracy: 0.4789 - val_loss: 1.4617 - val_accuracy: 0.4939
Epoch 8/50
26137/26137 [==============================] - 7433s 284ms/step - loss: 0.8925 - accuracy: 0.4822 - val_loss: 1.4257 - val_accuracy: 0.4978
Epoch 9/50
26137/26137 [==============================] - 7445s 285ms/step - loss: 0.8868 - accuracy: 0.4851 - val_loss: 1.5568 - val_accuracy: 0.4953
Epoch 10/50
26137/26137 [==============================] - 7387s 283ms/step - loss: 0.8816 - accuracy: 0.4874 - val_loss: 1.4534 - val_accuracy: 0.4970
Epoch 11/50
26137/26137 [==============================] - 7374s 282ms/step - loss: 0.8779 - accuracy: 0.4894 - val_loss: 1.4605 - val_accuracy: 0.4912
Epoch 12/50
26137/26137 [==============================] - 7411s 284ms/step - loss: 0.8733 - accuracy: 0.4915 - val_loss: 1.4694 - val_accuracy: 0.5030

do you observe if the validation loss rises after some epochs ? And if your validation loss is higher than the training loss its perfectly fine, your model is still learning. Naturally you can't have validation loss to be less than your training loss (it does become very close to your training loss if the model is deep enough ). But if your validation loss rises and your training loss keeps decreasing then you are overfitting. Use Earlystoppingto tackle overfitting. — Siddhant Tandon
I am more concerned about val acc being greater than train acc than the loss ,and val loss is fluctuating some times its rising sometimes decreasing — Madiha Samad
I still doubt if accuracy is a good measure to watch overfitting. And in any case, you can use EarlyStopping keras callback. Keras will watch for the metrics to avoid overfitting in case there is no improvement in the provided metrics. — Siddhant Tandon
@Siddhant Tandon had a perfect answer ! val loss keeps reducing while larger than training loss is absolutely OK as the model is fighting with regularization and dropout and still learning well ... until val loss increase then yes stop it or use check-point weights saving. — Thư Sinh

Tensorflow Warrior Tensorflow Warrior · Accepted Answer · 2020-05-10T07:23:57

Yes, you are facing over-fitting issue. To mitigate, you can try to implement below steps

1.Shuffle the Data, by using shuffle=True in VGG16_model.fit. Code is shown below:

history = VGG16_model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1,
                   validation_data=(x_validation, y_validation), shuffle = True)

2.Use Early Stopping. Code is shown below

callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=15)

3.Use Regularization. Code for Regularization is shown below (You can try l1 Regularization or l1_l2 Regularization as well):

from tensorflow.keras.regularizers import l2

Regularizer = l2(0.001)

VGG16_model.add(Conv2D(96,11, 11, input_shape = (227,227,3),strides=(4,4), padding='valid', activation='relu', data_format='channels_last', 
                    activity_regularizer=Regularizer, kernel_regularizer=Regularizer))

VGG16_model.add(Dense(units = 2, activation = 'sigmoid', 
                    activity_regularizer=Regularizer, kernel_regularizer=Regularizer))

4.You can try using BatchNormalization.

5.Perform Image Data Augmentation using ImageDataGenerator. Refer this link for more info about that.

6.If the Pixels are not Normalized, Dividing the Pixel Values with 255 also helps

Validation loss and validation accuracy both are higher than training loss and acc and fluctuating

1 Answers