2
votes

Please help me understand why my model overfits if my input data is normalized to [-0.5. 0.5] whereas it does not overfit otherwise.

I am solving a regression ML problem trying to detect location of 4 key points on images. To do that I import pretrained ResNet 50 and replace its top layer with the following architecture:

  • Flattening layer right after ResNet
  • Fully Connected (dense) layer with 256 nodes followed by LeakyRelu activation and Batch Normalization
  • Another Fully Connected layer with 128 nodes also followed by LeakyRelu and Batch Normalization
  • Last Fully connected layer (with 8 nodes) which give me 8 coordinates (4 Xs and 4 Ys) of 4 key points.

Since I stick with Keras framework, I use ImageDataGenerator to produce flow of data (images). Since output of my model (8 numbers: 2 coordinates for each out of 4 key points) normalized to [-0.5, 0.5] range, I decided that input to my model (images) should also be in this range and therefore normalized it to the same range using preprocessing_function in Keras' ImageDataGenerator.

Problem came out right after I started model training. I have frozen entire ResNet (training = False) with the goal in mind to first move gradients of the top layers to the proper degree and only then unfreeze a half of ResNet and finetune the model. When training with ResNet frozen, I noticed that my model suffers from overfitting right after a couple of epochs. Surprisingly, it happens even though my dataset is quite decent in size (25k images) and Batch Normalization is employed.

What's even more surprising, the problem completely disappears if I move away from input normalization to [-0.5, 0.5] and go with image preprocessing using tf.keras.applications.resnet50.preprocess_input. This preprocessing method DOES NOT normalize image data and surprisingly to me leads to proper model training without any overfitting.

I tried to use Dropout with different probabilities, L2 regularization. Also tried to reduce complexity of my model by reducing the number of top layers and the number of nodes in each top layer. I did play with learning rate and batch size. Nothing really helped if my input data is normalized and I have no idea why this happens.

IMPORTANT NOTE: when VGG is employed instead of ResNet everything seems to work well!

I really want to figure out why this happens.


UPD: the problem was caused by 2 reasons: - batch normalization layers within ResNet didn't work properly when frozen - image preprocessing for ResNet should be done using Z-score

After two fixes mentioned above, everything seems to work well!

1
Welcome to Stack Overflow! It looks like by default it will not scale samples with mode="caffe" according to this -docs.w3cub.com/tensorflow~python/tf/keras/applications/resnet50/… Why don't you try both mode="tf" and scale in range between -1 and 1 and compare outcomes? Other than that, I assume that seeing of your code will help to understand your situation better and answer your question. - Stepan Novikov
@StepanNovikov, you were right, thanks for letting me know. Looks like proper image preprocessing was a part of a solution. I also noticed that if I unfreeze batch normalization layers within ResNet before training of my model, the behaviour of training also improves. Thanks for ur help! - Anthony Morgan

1 Answers

1
votes

Mentioning the Solution below for the benefit of the community.

Problem is resolved by making the changes mentioned below:

  1. Batch Normalization layers within ResNet didn't work properly when frozen. So, Batch Normalization Layers within ResNet should be unfreezed, before Training the Model.
  2. Image Preprocessing (Normalization) for ResNet should be done using Z-score, instead of preprocessing_function in Keras' ImageDataGenerator