0
votes

I'm using Libsvm to resolve a binary classification problem. My dataset has ~50K attributes and 18 samples. I'm using a leave one out validation (training on 17 samples and test on the remaining sample). I'm normalizing the data by using:

svm-scale -s scaling_parameters Train$i > TrainScaled$i
svm-scale -r scaling_parameters Test$i > TestScaled$i

the training and the prediction is done as:

svm-train -s 0 -c 5 -t 2 -g 0.5 -e 0.1 TrainScaled$i model
svm-predict TestScaled$i model predicted.out

The model always predict the same class (majority one). So I obtain 75% accuracy, but the model is useless because it always predict the same class for each sample. I tried different type of kernels and paramenters but I still have the same result. What could it be? Are the data so hard to "divided" by the hyperplane?

2
My dataset has ~50K attributes and 18 samples I think you need more sample to train a model, you can try to analysis is there any feature in the 50K dimensions will help in discriminate the two class in you samplemichaeltang
I tried to apply a feature selection but nothing changed, even applying a drastic reduction to only 100 attributesTitus Pullo

2 Answers

0
votes

Its likely that given the ratio of features to classes, that your algorithm may be overfitting your input even with a significant amount of regularization. Have you tried using dimensionality reduction techniques such as PCA and working with a smaller number of features? You could also try some sort of feature selection algorithm to get a small subset of features.

0
votes

I have encountered the same problem in my application, I used libsvm in my c++ project to perform binary classification, but it always predicts only one class for all samples.

The followings are some information about my application.

(1) I have 260 training samples in total and set 80% for train and cross validation, and 20% for test.

(2) While number of my features is 7218, which is much bigger than the sample number.

(3) For binary classification, my class labels are set as 0 and 1.

After repeated trials, I found that the failure of SVM classification in my application is primarily caused by the inefficiency of used features, which makes the overfitting of SVM classification model, especially when there are large amounts of features.