Train Accuracy is very high, Validation accuracy is very high but the test set accuracy is very low

Question

I have split the dataset ( around 28K images) into 75% trainset and 25% testset. Then I have taken randomly 15% of trainset and randomly 15% of testset to create a validation set. The goal is to classify the images into two categories. The exact image sample can't be shared. But its similar to the one attached. I'm using this model: VGG19 with imagenet weights with last two layers trainable and 4 dense layers appended. I am also using ImageDataGenerator to Augment the images. I trained the model for 30 epochs and found that the training accuracy is 95% and Validation accuracy is 96% and when trained on test dataset it fell down enormously to 75% only.

I have tried regularization and dropout to tackle the overfitting if it is suffering. I have also done one more thing to see what happens if I use the testset as Validation set and test the model on the same testset. The results were: Trainset Acc = 96% and Validation ACC = 96.3% and the testAcc = 68%. I don't understand what should I Do ?

image

check for class imbalance. i.e check if you have a class which is in test case but in neither of training set nor in validation set. — Mitiku
Why has somebody downvoted my question? whats wrong in the question? — Harsh Vardhan
@HarshVardhan DId you find a solution? I am having similar issues — ManInMoon
@ManInMoon if your is same as mine then there is no problem with the network. I forgot to normalize the images at test time. Did you do it too ? check that. — Harsh Vardhan

Mark.F Mark.F · Accepted Answer · 2019-01-16T08:54:36

First off, you need to make sure that when you split in data, the relative size of every class in the new datasets is equal. It can be imbalanced if that is the distribution of your initial data, but it must have the same imbalance in all datasets after the split.

Now, regarding the split. If you need a train, validation and test sets, they must all be independent of each-other (no-shared samples). This is important if you don't want to cheat yourself with the results that you are getting.

In general, in machine-learning we start from a training set and a test set. For choosing the best model architecture/hyper-parameters, we further divide the training set to get the validation set (the test set should not be touched). After determining the best architecture/hyper-parameters for our model, we combine the training and validation set and train the best-case model from scratch with the combined full training set. Only now we get to test the results on the test set.

Train Accuracy is very high, Validation accuracy is very high but the test set accuracy is very low

2 Answers