1
votes

I'm training and cross-validating (10-fold) data using libSVM (with linear kernel).

The data consist 1800 fMRI intensity voxels represented as a single datapoint. There are around 88 datapoints in the training-set-file for svm-train.

the training-set-file looks as follow:

+1 1:0.9 2:-0.2 ... 1800:0.1

-1 1:0.6 2:0.9 ... 1800:-0.98

...

I should also mention i'm using the svm-train script (came along with the libSVM package).

The problem is that when running svm-train - it's result as 100% accuracy!

This doesn't seem to reflect the true classification results! The data isn't unbalanced since

#datapoints labeled +1 == #datpoints labeled -1

Iv'e also checked the scaler (scaling correctly), and also tried to change the labels randomly to see how it impacts the accuracy - and it's decreasing from 100% to 97.9%.

Could you please help me understand the problem? If so, what can I do to fix it?

Thanks,

Gal Star

1
I don't think there is a problem. Your SVM can easily give 100% fit for training set, it is perfectly fine. This is called overfitting en.wikipedia.org/wiki/Overfitting I think you need to read up on training in-sample and out-of-sample.sashkello
This question appears to be off-topic because it is about machine learning.sashkello
How can i read up the training in-sample and out-sample?gal.star
I mean read some literature on this topic :) This is too large of a problem to outline as an answer, there is a lot of research about proper training and cross-validation. If you don't know what it means, this is what you need to know before doing any coding...sashkello
Hi, so basicaly you think that i should be having better results if i'll reduce the amount voxels intensity from 1800 to a smaller amount, maybe by choosing the correct representative voxels?gal.star

1 Answers

2
votes

Make sure you include '-v 10' in the svmtrain option. I'm not sure your 100% accuracy comes from training sample or validation sample. It is very possible to get a 100% training accuracy since you have much less sample number than the feature number. But if your model suffers from overfitting, the validation accuracy may be low.