I'm using the Python interface for libsvm, and what I'm noticing is that after selecting the best C
and gamma
parameters (RBF kernel) using grid search, when I train the model and cross validate it (5 fold, if it's relevant), the accuracy that I receive is the same as the ratio of labels in my training data set.
I have 3947 samples, and 2898 of them have label -1, and the rest have label 1. So that's 73.4229% of the samples.
And when I train the model and cross validate it 5 folds, this is what I get -
optimization finished, #iter = 1529
nu = 0.531517 obj = -209.738688,
rho = 0.997250 nSV = 1847, nBSV = 1534
Total nSV = 1847
Cross Validation Accuracy = 73.4229%
Does this mean that the SVM is not taking the features into account? Or that it's the data at fault here? Are they both related at all? I'm just not able to get it past the 73.4229 number. Also, the number of support vectors is supposed to be much less than the size of the dataset, but in this case, it doesn't seem so.
In general, what does it mean when the cross validation accuracy is the same as the ratio of labels in the dataset?