Transfer Learning - Val_loss strange behaviour

Question

I am trying to use transfer-learning on MobileNetV2 from keras.application in phyton. My images belongs to 4 classes with an amount of 8000, 7000, 8000 and 8000 images in the first, second, third and last class. My images are gray-scaled and resized from 1024x1024 to 128x128.

I removed the classification dense layers from MobileNetV2 and added my own dense layers:

global_average_pooling2d_1 (Glo Shape = (None, 1280)         0 Parameters                            
______________________________________________________________________________
dense_1 (Dense)                 Shape=(None, 4)            5124 Parameters      
______________________________________________________________________________
dropout_1 (Dropout)             Shape=(None, 4)            0  Parameters                        
________________________________________________________________
dense_2 (Dense)                 Shape=(None, 4)            20 Parameters                         
__________________________________________________________________________
dense_3 (Dense)                 Shape=(None, 4)            20 Parameters                        

Total params: 2,263,148

Trainable params: 5,164

Non-trainable params: 2,257,984

As you can see I added 2 dense layers with dropout as regularizer. Furhtermore, I used the following

opt = optimizers.SGD(lr=0.001, decay=4e-5, momentum=0.9)

model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

batch_size = 32

My results on training are very weird... :

Epoch

1 loss: 1.3378 - acc: 0.3028 - val_loss: 1.4629 - val_acc: 0.2702

2 loss: 1.2807 - acc: 0.3351 - val_loss: 1.3297 - val_acc: 0.3208

3 loss: 1.2641 - acc: 0.3486 - val_loss: 1.4428 - val_acc: 0.3707

4 loss: 1.2178 - acc: 0.3916 - val_loss: 1.4231 - val_acc: 0.3758

5 loss: 1.2100 - acc: 0.3909 - val_loss: 1.4009 - val_acc: 0.3625

6 loss: 1.1979 - acc: 0.3976 - val_loss: 1.5025 - val_acc: 0.3116

7 loss: 1.1943 - acc: 0.3988 - val_loss: 1.4510 - val_acc: 0.2872

8 loss: 1.1926 - acc: 0.3965 - val_loss: 1.5162 - val_acc: 0.3072

9 loss: 1.1888 - acc: 0.4004 - val_loss: 1.5659 - val_acc: 0.3304

10 loss: 1.1906 - acc: 0.3969 - val_loss: 1.5655 - val_acc: 0.3260

11 loss: 1.1864 - acc: 0.3999 - val_loss: 1.6286 - val_acc: 0.2967

(...)

Summarizing, the loss of training does not decrease anymore and is still very high. The model also overfits. You may ask why I added only 2 dense layers with 4 neurons in each. In the beginning I tried different configurations (e.g. 128 neurons and 64 neurons and also different regulaziers), then overfitting was a huge problem, i.e. accuracy on training was almost 1 and loss on test was still far away from 0.

I am a little bit confused what is going on, since something tremendously is wrong here.

Fine-tuning attempts: Different numbers of neurons in the dense layers in the classification part varying from 1024 to 4. Different learning rates (0.01, 0.001, 0.0001) Different batch sizes (16,32, 64) Different regulaziers L1 with 0.001, 0.0001

Results: Always huge overfitting

base_model = MobileNetV2(input_shape=(128, 128, 3), weights='imagenet', include_top=False)

# define classificator
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(4, activation='relu')(x)
x = Dropout(0.8)(x)
x = Dense(4, activation='relu')(x)
preds = Dense(4, activation='softmax')(x) #final layer with softmax activation

model = Model(inputs=base_model.input, outputs=preds)

for layer in model.layers[:-4]:
    layer.trainable = False

opt = optimizers.SGD(lr=0.001, decay=4e-5, momentum=0.9)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

batch_size = 32
EPOCHS = int(trainY.size/batch_size)

H = model.fit(trainX, trainY, validation_data=(testX, testY), epochs=EPOCHS, batch_size=batch_size)

Result should be that there is no overfitting and val_loss close to 0. I know that from some paper working on similiar image sets.

UPDATE: Here are some pictures of val_loss, train_loss and accuracy: 2 dense layers with 16 and 8 neurons, lr =0.001 with decay 1e-6, batchsize=25

Results from the image you linked doesn't look too bad to me. Looks like when a general model is trained and reached capacity. But I assume the overfitting only starts occurring once you increase the number of neurons past 16/8? The image size reduction you are doing could be very significant. Sure you're not using too much detail? — JimmyOnThePage
Since you're using GlobalAveragePooling, is it necessary for that big a size reduction? — JimmyOnThePage

Ashish Taldeokar Ashish Taldeokar · Accepted Answer · 2019-06-06T07:37:36

Here, you used x = Dropout(0.8)(x) which means to drop 80% but i assume you need 20% so replace it by x = Dropout(0.2)(x)

Also, please go thorugh keras documentation for the same if needed.

an extract from the above documentation

keras.layers.Dropout(rate, noise_shape=None, seed=None)

rate: float between 0 and 1. Fraction of the input units to drop.

Transfer Learning - Val_loss strange behaviour

3 Answers