why my cross validation misclassification error rate contradicts testing dataset success rate

Question

I'm a beginner of ML.

I'm trying to use 600 images (300 pos and 300 neg) to train the linear SVM in Matlab; then, I have applied the trained model to my 400 testing images. If I set the cost of the linear SVM to be [0,1;1,0], the outcome success rate is around 65% while the cross-validation classification error is around 0.28, then I tried various cost values and found out that [0,1;x,0], the higher the x, the lower the classification error. However, what perplexed me is that while the classification error keeps decreasing the success rate is also decreasing drastically. The followings are my code:

% each row represents an image and each column represents a pixel value.
% each image row has been normalized.
SVMModel = fitcsvm(imgVector, Class, 'Cost', [0,1;1,0], 'Standardize', true, 'KernelScale', 'auto'); 
% cross validate the model
CVSVMModel = crossval(SVMModel);
classLoss = kfoldLoss(CVSVMModel);

I have also tried PCA to reduce the feature dimensionality, but the classification error and success rate perform in the same way.

Could anyone aware of what is happening here enlights me a bit please? Thanks very much.

you're probably overfitting, i.e your model is getting good at classifying the training data, but generalizes poorly on new unseen test data... — Amro

Jerry Jerry · Accepted Answer · 2016-04-22T02:22:00

The higher the cost penalty, the better the in-sample predicting would be. However, the high cost-penalty also would cause the overfitting issue, which means it would not predict as very well for newly observed data.

why my cross validation misclassification error rate contradicts testing dataset success rate

1 Answers