Transfer learning with Keras, validation accuracy does not improve from outset (beyond naive baseline) while train accuracy improves

Question

I am building a classifier for the Food-101 dataset (image dataset w/101 classes and 1k images per class). My approach has been to use Keras and transfer learning with the ResNet50 (weights from imagenet).

When training the model, the train accuracy improves moderately well in a few epochs (30%-->45%), but the validation accuracy essentially stays at 0.9-1.0%. I have tried simplifying, swapping optimizers, reducing and increasing the units in the hidden layer, stripping out all image augmentation, and setting a consistent random seed on flow_from_directory().

When I look at the predictions made by the model on the validation set, it is always the same class.

My sense it that the model is not overfitting so badly as to explain such a lack of movement in the validation accuracy.

Any suggestions to get the validation accuracy to improve would be greatly appreciated.

For reference, below is the relevant code snippets:

datagen = ImageDataGenerator(rescale=1./255, validation_split=0.2)

train_datagen = datagen.flow_from_directory('data/train/', seed=42, class_mode='categorical', subset='training', target_size=(256,256))
# prints "60603 images belonging to 101 classes"
val_datagen = datagen.flow_from_directory('data/train/', seed=42, class_mode='categorical', subset='validation', target_size=(256,256)) 
# prints "15150 images belonging to 101 classes"

train_steps = len(train_datagen) #1894
val_steps = len(val_datagen) #474
classes = len(list(train_datagen.class_indices.keys())) #101

conv_base = ResNet50(weights='imagenet', include_top=False, pooling='avg', input_shape=(256, 256, 3))

from keras.layers import GlobalAveragePooling2D
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers import BatchNormalization

model = Sequential()

model.add(conv_base)
model.add(BatchNormalization())
model.add(Dropout(0.2))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(classes, activation='softmax'))

conv_base.trainable = False

from keras.optimizers import Adam

model.compile(loss='categorical_crossentropy',
              optimizer=Adam(),
              metrics=['acc','top_k_categorical_accuracy'])

history = model.fit_generator(
    train_datagen,
    steps_per_epoch=train_steps,
    epochs=5,
    verbose=2,
    validation_data=val_datagen,
    validation_steps=val_steps
)

Here is the results of the .fit_generator():

Epoch 1/5
- 724s - loss: 3.1305 - acc: 0.3059 - top_k_categorical_accuracy: 0.5629 - val_loss: 6.5914 val_acc: 0.0099 - val_top_k_categorical_accuracy: 0.0494
Epoch 2/5
- 715s - loss: 2.4812 - acc: 0.4021 - top_k_categorical_accuracy: 0.6785 - val_loss: 7.4093 - val_acc: 0.0099 - val_top_k_categorical_accuracy: 0.0495
Epoch 3/5
- 714s - loss: 2.3559 - acc: 0.4248 - top_k_categorical_accuracy: 0.7026 - val_loss: 8.9146 - val_acc: 0.0094 - val_top_k_categorical_accuracy: 0.0495
Epoch 4/5
- 714s - loss: 2.2661 - acc: 0.4459 - top_k_categorical_accuracy: 0.7200 - val_loss: 8.0597 - val_acc: 0.0100 - val_top_k_categorical_accuracy: 0.0494
Epoch 5/5
- 715s - loss: 2.1870 - acc: 0.4583 - top_k_categorical_accuracy: 0.7348 - val_loss: 7.5171 - val_acc: 0.0100 - val_top_k_categorical_accuracy: 0.0483

Here is the model.summary():

Layer (type)                 Output Shape              Param #   
=================================================================
resnet50 (Model)             (None, 2048)              23587712  
_________________________________________________________________
batch_normalization_1 (Batch (None, 2048)              8192      
_________________________________________________________________
dropout_1 (Dropout)          (None, 2048)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 512)               1049088   
_________________________________________________________________
dropout_2 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 101)               51813     
=================================================================
Total params: 24,696,805
Trainable params: 1,104,997
Non-trainable params: 23,591,808
_________________________________________________________________

RandyESH RandyESH · Accepted Answer · 2020-04-07T07:41:36

The reason for your low validation accuracy has to do with the way the model is built. It is reasonable to expect that transfer learning would work well in this context. However, your top-1 and top-5 hover closely to 1/101 and 5/101 respectively. This would indicate that your model is classifying by chance and has not learnt the underlying signal (features) for your dataset. Therefore, transfer learning has not worked in this scenario. It does however mean it wont ever work.

I have repeated your experiments and I have obtained the same results, that is, top-1 and top-5 accuracies that mirror classification by random choice. However, I then unfroze the layers of the ResNet50 model and repeated the experiment. This is just the slightly different way of doing transfer learning. I got the following results after 10 epochs of training :

Epoch 10/50 591/591 [==============================] - 1492s 3s/step - loss: 1.0594 - accuracy: 0.7459 - val_loss: 1.1397 - val_accuracy: 0.7143

This is not perfect. However, the model has not yet converged and there are some preprocessing steps that could be applied to further improve the result.

The reason for your observation lies in the fact that the frozen ResNet50 model was trained from a distribution of images that were fundamentally different than the Food101 dataset. This mismatch in data distribution causes the poor performance as the transformations performed by the frozen network are not tuned to the Food101 images. Unfreezing the network lets the neurons actually learn the Food101 images which is responsible for the better result.

Hope this helps you.

Transfer learning with Keras, validation accuracy does not improve from outset (beyond naive baseline) while train accuracy improves

3 Answers