python - Scikit Learn SVC decision

75

votes

I'm trying to understand the relationship between decision_function and predict, which are instance methods of SVC (http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html). So far I've gathered that decision function returns pairwise scores between classes. I was under the impression that predict chooses the class that maximizes its pairwise score, but I tested this out and got different results. Here's the code I was using to try and understand the relationship between the two. First I generated the pairwise score matrix, and then I printed out the class that has maximal pairwise score which was different than the class predicted by clf.predict.

        result = clf.decision_function(vector)[0]
        counter = 0
        num_classes = len(clf.classes_)
        pairwise_scores = np.zeros((num_classes, num_classes))
        for r in xrange(num_classes):
            for j in xrange(r + 1, num_classes):
                pairwise_scores[r][j] = result[counter]
                pairwise_scores[j][r] = -result[counter]
                counter += 1

        index = np.argmax(pairwise_scores)
        class = index_star / num_classes
        print class
        print clf.predict(vector)[0]

Does anyone know the relationship between these predict and decision_function?

pythonnumpysvmscikit-learn

"decision function returns pairwise scores between classes" is incorrect. It should be the "score for each class" as it's written in the documentation page for decision_function part: "Distance of the samples X to the separating hyperplane." – justhalf

@justhalf: no, the OP is correct. sklearn.svm.SVC uses a pairwise (one-vs-one) decomposition by default and returns distances to all of the n(n-1)/2 hyperplanes for each sample. – Fred Foo

Oops, yes, I remembered reading that somewhere. but was misled by the documentation. Sorry! – justhalf

After trying to answer, I think that bcorso's answer should be the one. The relationship is actually based on the code he translated from the C++ implementation:

decision = decision_function(params, sv, nv, a, b, X); votes = [(i if decision[p] > 0 else j) for p,(i,j) in enumerate((i,j) for i in range(len(cs)) for j in range(i+1,len(cs)))]

. The highest vote out of votes is basically what predict does. – Eric Platon

49

votes

I don't fully understand your code, but let's go trough the example of the documentation page you referenced:

import numpy as np
X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
y = np.array([1, 1, 2, 2])
from sklearn.svm import SVC
clf = SVC()
clf.fit(X, y)

Now let's apply both the decision function and predict to the samples:

clf.decision_function(X)
clf.predict(X)

The output we get is:

array([[-1.00052254],
       [-1.00006594],
       [ 1.00029424],
       [ 1.00029424]])
array([1, 1, 2, 2])

And that is easy to interpret: The desion function tells us on which side of the hyperplane generated by the classifier we are (and how far we are away from it). Based on that information, the estimator then label the examples with the corresponding label.

25

votes

For those interested, I'll post a quick example of the predict function translated from C++ (here) to python:

# I've only implemented the linear and rbf kernels
def kernel(params, sv, X):
    if params.kernel == 'linear':
        return [np.dot(vi, X) for vi in sv]
    elif params.kernel == 'rbf':
        return [math.exp(-params.gamma * np.dot(vi - X, vi - X)) for vi in sv]

# This replicates clf.decision_function(X)
def decision_function(params, sv, nv, a, b, X):
    # calculate the kernels
    k = kernel(params, sv, X)

    # define the start and end index for support vectors for each class
    start = [sum(nv[:i]) for i in range(len(nv))]
    end = [start[i] + nv[i] for i in range(len(nv))]

    # calculate: sum(a_p * k(x_p, x)) between every 2 classes
    c = [ sum(a[ i ][p] * k[p] for p in range(start[j], end[j])) +
          sum(a[j-1][p] * k[p] for p in range(start[i], end[i]))
                for i in range(len(nv)) for j in range(i+1,len(nv))]

    # add the intercept
    return [sum(x) for x in zip(c, b)]

# This replicates clf.predict(X)
def predict(params, sv, nv, a, b, cs, X):
    ''' params = model parameters
        sv = support vectors
        nv = # of support vectors per class
        a  = dual coefficients
        b  = intercepts 
        cs = list of class names
        X  = feature to predict       
    '''
    decision = decision_function(params, sv, nv, a, b, X)
    votes = [(i if decision[p] > 0 else j) for p,(i,j) in enumerate((i,j) 
                                           for i in range(len(cs))
                                           for j in range(i+1,len(cs)))]

    return cs[max(set(votes), key=votes.count)]

There are a lot of input arguments for predict and decision_function, but note that these are all used internally in by the model when calling predict(X). In fact, all of the arguments are accessible to you inside the model after fitting:

# Create model
clf = svm.SVC(gamma=0.001, C=100.)

# Fit model using features, X, and labels, Y.
clf.fit(X, y)

# Get parameters from model
params = clf.get_params()
sv = clf.support_vectors
nv = clf.n_support_
a  = clf.dual_coef_
b  = clf._intercept_
cs = clf.classes_

# Use the functions to predict
print(predict(params, sv, nv, a, b, cs, X))

# Compare with the builtin predict
print(clf.predict(X))

20

votes

There's a really nice Q&A for the multi-class one-vs-one scenario at datascience.sx:

Question

I have a multiclass SVM classifier with labels 'A', 'B', 'C', 'D'.

This is the code I'm running:
>>>print clf.predict([predict_this])
['A']
>>>print clf.decision_function([predict_this])
[[ 185.23220833   43.62763596  180.83305074  -93.58628288   62.51448055  173.43335293]]
How can I use the output of decision function to predict the class (A/B/C/D) with the highest probability and if possible, it's value? I have visited https://stackoverflow.com/a/20114601/7760998 but it is for binary classifiers and could not find a good resource which explains the output of decision_function for multiclass classifiers with shape ovo (one-vs-one).

Edit:

The above example is for class 'A'. For another input the classifier predicted 'C' and gave the following result in decision_function
[[ 96.42193513 -11.13296606 111.47424538 -88.5356536 44.29272494 141.0069203 ]]
For another different input which the classifier predicted as 'C' gave the following result from decision_function,
[[ 290.54180354 -133.93467605  116.37068951 -392.32251314 -130.84421412   284.87653043]]
Had it been ovr (one-vs-rest), it would become easier by selecting the one with higher value, but in ovo (one-vs-one) there are (n * (n - 1)) / 2 values in the resulting list.

How to deduce which class would be selected based on the decision function?

Answer

Your link has sufficient resources, so let's go through:

When you call decision_function(), you get the output from each of the pairwise classifiers (n*(n-1)/2 numbers total). See pages 127 and 128 of "Support Vector Machines for Pattern Classification".

Click on the "page 127 and 128" link (not shown here, but in the Stackoverflow answer). You should see:

Python's SVM implementation uses one-vs-one. That's exactly what the book is talking about.

For each pairwise comparison, we measure the decision function

The decision function is the just the regular binary SVM decision boundary

What does that to do with your question?

clf.decision_function() will give you the $D$ for each pairwise comparison

The class with the most votes win

For instance,

[[ 96.42193513 -11.13296606 111.47424538 -88.5356536 44.29272494 141.0069203 ]]

is comparing:

[AB, AC, AD, BC, BD, CD]

We label each of them by the sign. We get:

[A, C, A, C, B, C]

For instance, 96.42193513 is positive and thus A is the label for AB.

Now we have three C, C would be your prediction. If you repeat my procedure for the other two examples, you will get Python's prediction. Try it!

18

votes

When you call decision_function(), you get the output from each of the pairwise classifiers (n*(n-1)/2 numbers total). See pages 127 and 128 of "Support Vector Machines for Pattern Classification".

Each classifier puts in a vote as to what the correct answer is (based on the sign of the output of that classifier); predict() returns the class with the most votes.

3

votes

They probably have a bit complicated mathematical relation. But if you use the decision_function in LinearSVC classifier, the relation between those two will be more clear! Because then decision_function will give you scores for each class label (not same as SVC) and predict will give the class with the best score.

1

votes

Predict() follows a pairwise voting scheme which returns the class with most votes over all pairwise comparisons. When two classes score the same, the class with the lowest index is returned.

Below a Python example that applies this voting scheme to the (n*(n-1)/2 pairwise scores as returned by a one-versus-one decision_function().

from sklearn import svm
from sklearn import datasets
from numpy import argmax, zeros
from itertools import combinations

# do pairwise comparisons, return class with most +1 votes
def ovo_vote(classes, decision_function):
    combos = list(combinations(classes, 2))
    votes = zeros(len(classes))
    for i in range(len(decision_function[0])):
        if decision_function[0][i] > 0:
            votes[combos[i][0]] = votes[combos[i][0]] + 1
        else:
            votes[combos[i][1]] = votes[combos[i][1]] + 1
    winner = argmax(votes)
    return classes[winner]

# load the digits data set
digits = datasets.load_digits()

X, y = digits.data, digits.target

# set the SVC's decision function shape to "ovo"
estimator = svm.SVC(gamma=0.001, C=100., decision_function_shape='ovo')

# train SVC on all but the last digit
estimator.fit(X.data[:-1], y[:-1])

# print the value of the last digit
print("To be classified digit: ", y[-1:][0])

# print the predicted class
pred = estimator.predict(X[-1:])
print("Perform classification using predict: ", pred[0])

# get decision function
df = estimator.decision_function(X[-1:])

# print the decision function itself
print("Decision function consists of",len(df[0]),"elements:")
print(df)

# get classes, here, numbers 0 to 9
digits = estimator.classes_

# print which class has most votes
vote = ovo_vote(digits, df)
print("Perform classification using decision function: ", vote)

python - Scikit Learn SVC decision_function and predict

6 Answers

Question

Answer