2
votes

I have split the dataset ( around 28K images) into 75% trainset and 25% testset. Then I have taken randomly 15% of trainset and randomly 15% of testset to create a validation set. The goal is to classify the images into two categories. The exact image sample can't be shared. But its similar to the one attached. I'm using this model: VGG19 with imagenet weights with last two layers trainable and 4 dense layers appended. I am also using ImageDataGenerator to Augment the images. I trained the model for 30 epochs and found that the training accuracy is 95% and Validation accuracy is 96% and when trained on test dataset it fell down enormously to 75% only.

I have tried regularization and dropout to tackle the overfitting if it is suffering. I have also done one more thing to see what happens if I use the testset as Validation set and test the model on the same testset. The results were: Trainset Acc = 96% and Validation ACC = 96.3% and the testAcc = 68%. I don't understand what should I Do ?

image

2
check for class imbalance. i.e check if you have a class which is in test case but in neither of training set nor in validation set. - Mitiku
There is no class balance, I have checked. - Harsh Vardhan
Why has somebody downvoted my question? whats wrong in the question? - Harsh Vardhan
@HarshVardhan DId you find a solution? I am having similar issues - ManInMoon
@ManInMoon if your is same as mine then there is no problem with the network. I forgot to normalize the images at test time. Did you do it too ? check that. - Harsh Vardhan

2 Answers

2
votes

First off, you need to make sure that when you split in data, the relative size of every class in the new datasets is equal. It can be imbalanced if that is the distribution of your initial data, but it must have the same imbalance in all datasets after the split.

Now, regarding the split. If you need a train, validation and test sets, they must all be independent of each-other (no-shared samples). This is important if you don't want to cheat yourself with the results that you are getting.

In general, in machine-learning we start from a training set and a test set. For choosing the best model architecture/hyper-parameters, we further divide the training set to get the validation set (the test set should not be touched). After determining the best architecture/hyper-parameters for our model, we combine the training and validation set and train the best-case model from scratch with the combined full training set. Only now we get to test the results on the test set.

-1
votes

I had faced a similar issue in one of my practice projects. My InceptionV3 model gave a high training accuracy (99%), a high validation accuracy (95%+) but a very low testing accuracy (55%).

The dataset was a subset of the popular Dogs vs. Cats dataset (https://www.kaggle.com/c/dogs-vs-cats/data), made by me, having 15k images split into 3 folders: train, valid, and test in the ratio of 60:20:20 (9000, 3000, 3000 each halved for cats folder and dogs folder).

The error in my case was actually in my code. It had nothing to do with the model or the data. The model had been defined inside a function and that was creating an untrained instance during the evaluation. Hence, an untrained model was being tested upon on the test dataset. After correcting the errors in my notebook I got a 96%+ testing accuracy.

Links:

https://colab.research.google.com/drive/1-PO1KJYvXdNC8LbvrdL70oG6QbHg_N-e?usp=sharing&fbclid=IwAR2k9ZCXvX_y_UNWpl4ljs1y0P3budKmlOggVrw6xI7ht0cgm03_VeoKVTI

https://drive.google.com/drive/u/3/folders/1h6jVHasLpbGLtu6Vsnpe1tyGCtR7bw_G?fbclid=IwAR3Xtsbm_EZA3TOebm5EfSvJjUmndHrWXm4Iet2fT3BjE6pPJmnqIwW8KWY tyuhm

Other probable causes:

  • One possibility is that the testing set would have a different distribution than the validation set (This could be excluded by joining all the data, randomizing, and splitting again to train, test, valid).
  • To swap valid and test with each other and see if it has an effect (Sometimes if one set has relatively harder examples).
  • If the training somehow overfitted on the validation set (Is it possible that during training, at one or more steps, the model giving the best score on the validation set is chosen).
  • Images overlapping, lack of shuffling.
  • In the deep learning world, if something seems way too odd to be true, or even way too good to be even true, a good guess is that its probably a bug unless proven otherwise!