0
votes

A quick recap for what I want to do, I want to determine if a text is written by the same author or not. Thus I use one-class classification.
In my training set (18 samples), it looks like this (for simplifying, I used x as data value):

1 1:x 2:x "until" 200:x
1 1:x 2:x "until" 200:x

In my testing set (3 samples), it looks like this (for simplifying, I used y as data value):

1 1:y 2:y "until" 200:y

For data preparation (training and testing set), I set upper and lower scaling limit to +1/-1

-l -1 -u 1

For training, I use svm_type is one class svm, kernel type is Sigmoid. Yet the accuracy is 0%

optimization finished, #iter = 13
obj = 22.901769047004553, rho = 5.476401914859387
nSV = 11, nBSV = 6
Accuracy = 0.0% (0/21) (classification)

Can someone show me what I did wrong here?

1

1 Answers

1
votes

You need to tune the parameters.

nu is an upper bound on the fraction of training errors and a lower bound of the fraction of support vectors. With such setting, basically the amount nu(0.01 means 1% for example) of the data can be rejected and flagged as an outlier.

Also try to tune gamma and coef0 values in Sigmoid kernel.

Although it may not be the direct factor that causes your zero training accuracy, I suggest you scale the data by yourself instead of libsvm's maximum-minimum scaling, check standard scaling.

 x_mean = mean(x);
 x_std = std(x);
 x = (x - x_mean)./x_std;

Then use the same x_mean and x_std value to scale your test data.