0
votes

I am using scikitlearn for svm classification.

I need a classifier that returns default value when a given test item doesn't match any of the training-set items, i.e. when the distance is very high. Is that possible?

For Example

Let's say my training-set is

   X= [[0.5,0.5,2],[4, 4,16],[16, 16,64]]

and labels

y=[0,1,2]

then I run training

clf = svm.SVC()
clf.fit(X, y)

then I run prediction

clf.predict([-100,-100,-200])

Now as we can see the test-item [-100,-100,-200] is too far away from any of the training-items, in this case the prediction will yield [2] which is this item [16, 16,64], is there anyway to make it return anything else (not from training-set)?

1
Too broad and informal. But you probably ask for outlier-detection / one-class svm also available in sklearn. - sascha
yes actually I just need to tell whether the item can be matched to one of the training-set items OR not, I don't care about the values, so for example I want to get 1 for matched and -1 for not matched. - Bakri Bitar

1 Answers

0
votes

I think you can create a label for those big values, and added into your training set.

X= [[0.5,0.5,2],[4, 4,16],[16, 16,64],[-100,-100,200]]
Y=[0,1,2,100]

and give a try.

Since SVM is supervised learning, which means the 'OUTPUT' have to be specified. If you are not certain about the 'OUTPUT', do some non supervised clustering (kmeans for example), and have a rough idea how many possible 'OUTPUT' you will expect.