1
votes

I'm trying to get familiar to 1-class SVM using libsvm implementation. As I've read, there is no class labels in libsvm's 1-class task. But when reading the data file without the label column, there is always a read error. I tried labeling my toy data and then testing the result model with svm-train but accuracy was always terrible, around 50%.

My question is, if I have a labeled dataset (say, a few hundreds of gaussian distributed 2d-points and several outliers among them), how do I train libsvm with this data, and how do I estimate accuracy of the result model?

1

1 Answers

1
votes

1) One line of training set for LIBSVM (C/C++) looks as follows (called sparse data format):

label 1:value1 2:value2 .....

(end each line by a '\n' character)

You have to provide label column even for one-class case; just that it can be any number. LIBSVM ignores it during the training process. This should remove your read error.

2) On your accuracy on toy data, did you do cross validation on "nu" and "g" parameters? These are the hyper-parameters for one-class SVM model. You may also want to fiddle with kernel type. Was that 50% on training set or test set or validation set?

3) Since one-class SVM essentially does density estimation in high dimensions, all the points of training set should fall on or inside the hypersphere and they all should belong to a single class.You may want to refer here. You have to find a way to construct a training set devoid of outliers and feed outliers as test points (that may as well contain some points belonging to inside of circle) to estimate the accuracy of the model. If this is not possible, you have to resort to other means of outlier detection like clustering. The good news is there are powerful clustering algorithms in literature.

4)Your dataset is 2-D, so it should not be hard to plot them in the first place and get an idea about the dataset and outliers.