I am trying to use transfer-learning on MobileNetV2 from keras.application in phyton. My images belongs to 4 classes with an amount of 8000, 7000, 8000 and 8000 images in the first, second, third and last class. My images are gray-scaled and resized from 1024x1024 to 128x128.
I removed the classification dense layers from MobileNetV2 and added my own dense layers:
global_average_pooling2d_1 (Glo Shape = (None, 1280) 0 Parameters
______________________________________________________________________________
dense_1 (Dense) Shape=(None, 4) 5124 Parameters
______________________________________________________________________________
dropout_1 (Dropout) Shape=(None, 4) 0 Parameters
________________________________________________________________
dense_2 (Dense) Shape=(None, 4) 20 Parameters
__________________________________________________________________________
dense_3 (Dense) Shape=(None, 4) 20 Parameters
Total params: 2,263,148
Trainable params: 5,164
Non-trainable params: 2,257,984
As you can see I added 2 dense layers with dropout as regularizer. Furhtermore, I used the following
opt = optimizers.SGD(lr=0.001, decay=4e-5, momentum=0.9)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
batch_size = 32
My results on training are very weird... :
Epoch
1 loss: 1.3378 - acc: 0.3028 - val_loss: 1.4629 - val_acc: 0.2702
2 loss: 1.2807 - acc: 0.3351 - val_loss: 1.3297 - val_acc: 0.3208
3 loss: 1.2641 - acc: 0.3486 - val_loss: 1.4428 - val_acc: 0.3707
4 loss: 1.2178 - acc: 0.3916 - val_loss: 1.4231 - val_acc: 0.3758
5 loss: 1.2100 - acc: 0.3909 - val_loss: 1.4009 - val_acc: 0.3625
6 loss: 1.1979 - acc: 0.3976 - val_loss: 1.5025 - val_acc: 0.3116
7 loss: 1.1943 - acc: 0.3988 - val_loss: 1.4510 - val_acc: 0.2872
8 loss: 1.1926 - acc: 0.3965 - val_loss: 1.5162 - val_acc: 0.3072
9 loss: 1.1888 - acc: 0.4004 - val_loss: 1.5659 - val_acc: 0.3304
10 loss: 1.1906 - acc: 0.3969 - val_loss: 1.5655 - val_acc: 0.3260
11 loss: 1.1864 - acc: 0.3999 - val_loss: 1.6286 - val_acc: 0.2967
(...)
Summarizing, the loss of training does not decrease anymore and is still very high. The model also overfits. You may ask why I added only 2 dense layers with 4 neurons in each. In the beginning I tried different configurations (e.g. 128 neurons and 64 neurons and also different regulaziers), then overfitting was a huge problem, i.e. accuracy on training was almost 1 and loss on test was still far away from 0.
I am a little bit confused what is going on, since something tremendously is wrong here.
Fine-tuning attempts: Different numbers of neurons in the dense layers in the classification part varying from 1024 to 4. Different learning rates (0.01, 0.001, 0.0001) Different batch sizes (16,32, 64) Different regulaziers L1 with 0.001, 0.0001
Results: Always huge overfitting
base_model = MobileNetV2(input_shape=(128, 128, 3), weights='imagenet', include_top=False)
# define classificator
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(4, activation='relu')(x)
x = Dropout(0.8)(x)
x = Dense(4, activation='relu')(x)
preds = Dense(4, activation='softmax')(x) #final layer with softmax activation
model = Model(inputs=base_model.input, outputs=preds)
for layer in model.layers[:-4]:
layer.trainable = False
opt = optimizers.SGD(lr=0.001, decay=4e-5, momentum=0.9)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
batch_size = 32
EPOCHS = int(trainY.size/batch_size)
H = model.fit(trainX, trainY, validation_data=(testX, testY), epochs=EPOCHS, batch_size=batch_size)
Result should be that there is no overfitting and val_loss close to 0. I know that from some paper working on similiar image sets.
UPDATE: Here are some pictures of val_loss, train_loss and accuracy: 2 dense layers with 16 and 8 neurons, lr =0.001 with decay 1e-6, batchsize=25