1
votes

I built a 3D image classification model with CNN for my research. I only have 5000 images and used 4500 images for training and 500 image for test set. I tried different architectures and parameters for the training and the F1 score and the accuracy on the training sets were as high as 0.9. It was fortunate that I didn't have to spend a lot of time to find these settings for the high accuracy.

Now I applied this model for the test set and I got a quite satisfying prediction with F1 score of 0.8~0.85.

My question here is, is it necessary to do validation? When I was taking a machine learning course back then, I was taught to use a validation set for tuning hyper parameters. One reason why I did not do k-fold cross validation is because I do not have much data and wanted to use as many training data as possible. And my model shows a quite good prediction on the test set. Can my model still convince people as long as the accuracy/f1 score/ROC are good enough? Or can I try to convince people only by doing k-fold cross validation without making and testing on a test set separately?

Thank you!

2
Given that you test set is very small (500 images), I'd do cross-validation. The fact that you don't have much data highlights the need for cross-validation. If instead you have a large enough test set, you could skip the CV procedure.Stergios

2 Answers

1
votes

unfortunately i think that the single result won't be enough. This is due to the fact that your result could be just pure luck. Using a 10 fold CV you use 90% of your data (4500 images) for training and the remaining 10% for testing. So basically you are not using less images in the training with the advantage of more reliable results.

The validation scheme proposed by Martin is already a good one but if you are looking for something more robust you should use a nested cross validation:

  • Split the data-set in K folds
  • The i-th training set is composed by {1,2,..,K} \ i folds.
  • Split the training set in N folds.
  • Set a hyper-parameter values grid
  • For each hyper-parameter set of values:
    • train on {1,2,..,N} \ j folds and test on the j-th fold;
    • Iterate for all the N folds and compute the average F-score.
  • Choose the set of hyper-parameters that maximize your metric.

  • Train the model using the i-th training set and the optimal set of hyper-parameters and test on the i-th fold.

  • Repeat for all the K folds and compute the average metrics.

The average metrics could be not sufficient to prove the stability of the method so it's advisable to provide also the confidence interval or the variance of the results.

Finally, to have a really stable validation of your method, you could consider to substitute the initial K-fold cross validation with a re-sampling procedure. Instead of splitting the data in K fold you resample the dataset at random using 90% of the samples as training and 10% of samples for testing. Repeat this M times with M>K. If the computation is fast enough you can consider to do this 20-50 or 100 times.

0
votes

A cross validation dataset is used to adjust hyperparameters. You should never touch the test set, except when you are finished with everything!

As suggested in the comments, I recommend k-fold cross validation (e.g. k=10):

  1. Split your dataset into k=10 sets
  2. For i=1..10: Use sets {1, 2,..., 10} \ i as a training set (and to find the hyper parameters) and set i to evaluate.
  3. Your final score is the average among those k=10 evaluation scores.