I am using the one class SVM classifier OneClassSVM
from Scikit to determine outliers in a dataset. My dataset has 30000 samples with 1024 variables. I use 10 percent of those as training data.
clf=svm.OneClassSVM(nu=0.001,kernel="rbf",gamma=1e-5)
clf.fit(trset)
dist2hptr=clf.decision_function(trset)
tr_y=clf.predict(trset)
As above, I calculate the distance of each sample to the decision function using the decision_function(x)
function. When I compare the prediction results and the distance results, it always show positive distance for samples marked as +1 in predict output and negative distance values for samples marked as -1.
I thought distance doesn't have a sign since it does not deal with direction. I want to understand how the distances are calculated in OneClassSV
scikit classifier. Does the sign simply represent that the sample lies out of the decision hyperplane calculated by the SVM ?
Please help.