
I am trying to do transfer learning for VGG16 architecture with 'ImageNet' pretrained weights on PASCAL VOC 2012 dataset. PASCAL VOC is a multi label image dataset with 20 classes, and so I have modified the inbuilt VGG16 model like this:

def VGG16_modified():
    base_model = vgg16.VGG16(include_top=True,weights='imagenet',input_shape=(224,224,3))
    x = base_model.get_layer('block5_pool').output
    x = (GlobalAveragePooling2D())(x)
    predictions = Dense(20,activation='sigmoid')(x)

    final_model = Model(input = base_model.input, output = predictions)
    return final_model

and my input image preprocessing is like this:

img_val = []
for i in tqdm(range(dfval.shape[0])):
        img = image.load_img(train_images+y_val[0][i],target_size=(224,224))
        img = image.img_to_array(img)
x_val = np.array(img_val

I have converted the categorical labels like this with pd.get_dummies for 20 classes [[0 0 0 0 1 0 0 0 0 1 0 .... ]] and The corresponding labels are of the shape (number of image samples, 20). The input images are of shape (number of image samples, 224,224, 3)

When I trained the model for several epochs, I see very good validation accuracy (around 90%) but when I used the same validation data set to predict the images, it is giving the same class output for every image.

I trained the model like this:

model = VGG16_modified()
model.compile(optimizer=Adam(),loss='binary_crossentropy',metrics = ['accuracy'])
model.fit(x_train, y_train, epochs=100, validation_data=(x_val, yval), batch_size=4)

Later I loaded the model and tried to predict the labels for the same validation data set.

model = load_model(model)
preds = model.predict(image)

But I am getting same output for every image. The output is of shape [[0 0 0 ......1 0 0 0...]] I tried with more number of epochs,less number of epochs, by setting a few layers non trainable, by setting all layers trainable,changing the learning rate, using different optimizer (SGD), not using Imagenet weights and training from scratch but none of them are giving me the correct results. Can anyone tell me where have I gone wrong.

How do you define the accuracy? I’m curious about it because it’s multi-label classification problem. I think average precision or recall would make more sense.zihaozhihao
I highly suspect that your image is not preprocessed as same as training or validating. Make sure the input values are between the same range, such as (0,1)zihaozhihao
Have you applied any augmentation techniques on training images?Kaushik Roy
I don't think 90% accuracy is convincing. Because most of values are zero. If the average number of ground truth labels is 3. Then even all predict as zeros, you still have 22/25=0.88 accuracy.zihaozhihao
@Sree try to use recall or precision to monitor the training.zihaozhihao

1 Answers


Mentioning the Resolution here for the benefit of the community, as there are many comments to know the solution.

Issue here was that the Model was Freezed, i.e., the Layers were not Trained on the PASCAL VOC Dataset.

Weights of the Pre-Trained Model should be Freezed and the Weights of the Layers of the Model Trained on our Dataset shouldn't be.

Issue is resolved by setting, layer.trainable = True. This can be better understood by the screenshot below.

Note:Image is taken from Aurelien Geron's Book on Machine Learning and Deep Learning.