3
votes

In the following example: http://scikit-learn.org/stable/auto_examples/svm/plot_separating_hyperplane.html

I would like to obtain the coefficients of the (line) decision boundary shown in the picture. A call to

clf.coef_ 

returns

[[-0.2539717  -0.83806387]]

which, if I am not mistaken, represents the line of equation

y = -0.83806387 * x - 0.2539717

However, the above line is not the decision boundary obtained in the example, so what exactly is coef_ and how can I obtain the equation of the linear decision boundary ?

3

3 Answers

4
votes

To get the equation for the line of the decision boundary of a linear model you need to get both the coef_ and intercept_. Also note that since you are using a SVC there will be multiple decision boundaries involved.

The line equation can be constructed as:

y = w0 + w1 * x1 + w2 * x2 + ...

Where w0 is obtained from intercept_, w1 onwards are found in coef_, and x1 and onwards are your features.

For example, this code shows you how to print out the equations for each of your decision boundaries.

from sklearn import svm
import numpy as np

clf = svm.SVC(kernel="linear")

X = np.array([[1, 2], [3, 4], [5, 1], [6, 2]])
y = np.array(["A", "B", "A", "C"])

clf.fit(X, y)

for (intercept, coef) in zip(clf.intercept_, clf.coef_):
    s = "y = {0:.3f}".format(intercept)
    for (i, c) in enumerate(coef):
        s += " + {0:.3f} * x{1}".format(c, i)

    print(s)

In this example, the lines are determined to be:

y = 2.800 + -0.200 * x0 + -0.800 * x1
y = 7.000 + -1.000 * x0 + -1.000 * x1
y = 1.154 + -0.462 * x0 + 0.308 * x1

Source: http://scikit-learn.org/stable/modules/linear_model.html

3
votes

If you want to draw linear plot i.e. y = ax + b then you can use below chunk of code

tmp = clf.coef_[0]
a = - tmp[0] / tmp[1]

b = - (clf.intercept_[0]) / tmp[1]

xx = np.linspace(xlim[0], xlim[1])
yy = a * xx + b
plt.plot(xx, yy)

Hope this helps!

1
votes

Thanks @Prem and @Christopher Wells - this was realy helpful. I combined both answers because in Prem's code is no y_target included (--> see Christopher's answer), which works fine for y_target=0.

As an example I used the logRegression tutorial from scikitLearn. I inserted the following code:

mf = logreg.intercept_.shape[0];
xm = np.r_[np.min(X),np.max(X)]
yf = logreg.classes_.copy()
xm = np.r_[np.min(X),np.max(X)]

for jf in np.arange(mf):
    tmp = logreg.coef_[jf]
    a = - tmp[0] / tmp[1]; 
    b = - (logreg.intercept_[jf]-yf[jf]) / tmp[1]
    yy = a * xm + b
    plt.plot(xm, yy, label='Coeff ='+str(jf))
plt.legend()
plt.show()

This is perfect for y_target=0 (see graphic in the skLearn example). But what is with the other 2 straight lines ? Has anything more to be considerd ? Examlpe from scikitLearn: LogisticRegression