I know, there are multiple questions to this, but not a single one to my particular problem.
I'll simplify my problem in order to make it more clear. Lets say I have multiple sentences from an english document and I want to classify them using a one class svm (in libsvm) in order to be able to see anomalities (e.g. a german sentence) afterwards.
For training: I have samples of one class only (lets assume other classes are not existing beforehand). I extract all 3-grams (so the feature space includes max. 16777216 different features) and save them in libsvm format (label=1, just in case that matters)
Now I want to estimate my paramters. I tried to use the grid.py
using additional parameters, however, the runtime is too big for rbf kernels. So I try using linear kernels (therefore, the grid.py
may be changed in order to use only one value of gamma, as it does not matter for linear kernels).
Whatsoever, the smallest c
grid.py tests will shown as the best solution (does -c matter for linear kernels?).
Furthermore, it does not matter how much I change the -n
(nu) value, everytime the same relation between scores will be achieved (even though the number of support vectors changes). Scores are gathered by using the python implementation. (relation between scores means, that e.g. at first they are -1 and -2, i change nu
and afterwards they are e.g. -0.5 and -1, so if i sort them, the same order always appears, as in this example):
# python2
from sklearn.metrics import roc_curve, auc
import matplotlib.pyplot as plt
from svmutil import *
y,x = svm_read_problem("/tmp/english-3-grams.libsvm") # 5000 sentence samples
ym,xm = svm_read_problem("/tmp/german-3-grams.libsvm") # 50 sentence samples
m = svm_train(y,x,"-s 2 -t 2 -n 0.5");
# do the prediction in one or two steps, here is one step:
p_l, p_a, p_v = svm_predict(y[:100]+ym[:100],x[:100]+xm[:100],m)
# p_v are our scores.
# let's plot a roc curve
roc_ret = roc_curve([1]*100+[-1]*100,p_v)
plt.plot(roc_ret[0],roc_ret[1])
plt.show()
Here, everytime the exact same roc-curve is achieved (even though -n
is varied). Even if there is only 1 support vector, the same curve is shown.
Hence, my question (let's assume a maximum of 50000 samples per training):
- why is -n
not changing anything for the one class training process?
- what parameters do i need to change for a one class svm?
- is a linear kernel the best approach? (+ with regard to runtime) and rbf kernel parameter grid search takes ages for such big datasets
- liblinear is not being use because I want to do anomaly detection = one class svm
Best regards, mutilis