why sklearn.svm.SVC's attribute coef_ has shape = [n_class * (n_class-1) / 2, n_features]?

Question

I'm using sklearn.svm.SVC with linear kernel and i want to get the feature importance, so i use the attribute coeff_ which (as explained here: https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html) is an array with the shape [n_class * (n_class-1) / 2, n_features], for example, im my case, i have 10 classes and 54 features, so the shape is [45,54],

Why i get 45 arrays of weights? what is the meaning of each one of these arrays? Beacuse intuitively i would have expected 10 arrays of weights, one for each class

seralouk seralouk · Accepted Answer · 2020-05-11T10:12:27

The shape is indeed [n_class * (n_class-1) / 2, n_features] but why?

This is because if you have more than 2 classes i.e. if you have a problem that is not binary then the multiclass support of the function is handled according to an one-vs-one scheme.

Example: If you have 3 classes let's say 1,2,3 then the classifier will be fitted for the cases: 1vs2, 1vs3 and 2vs3. So here, we have n_class * (n_class-1) / 2 = 3 * (3-1) / 2 = 3.

Let's verify the above:

import numpy as np
from sklearn.svm import SVC

X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
y = np.array([1, 1, 2, 3]) # 3 classes

clf = SVC(kernel='linear')
clf.fit(X, y)

print(clf.coef_)
[[-0.5        -0.5       ]
 [-0.46153846 -0.30769231]
 [-1.          0.        ]]

Here, in the clf.coef_, each row corresponds to the above cases 1vs2, 1vs3 and 2vs3, respectively. So, first row i.e. [-0.5, -0.5] gives you the coefficient of the first and second feature/variable for the case of 1vs2 classification fitting.

P.S: In case of binary classification, then print(clf.coef_) will return only one line for the 1vs2 classification case.

Concerning the order of the rows:

In the case of “one-vs-one” SVC, the layout of the attributes is a little more involved. In the case of having a linear kernel, the attributes coef_ and intercept_ have the shape [n_class * (n_class - 1) / 2, n_features] and [n_class * (n_class - 1) / 2] respectively. This is similar to the layout for LinearSVC described above, with each row now corresponding to a binary classifier. The order for classes 0 to n is “0 vs 1”, “0 vs 2” , … “0 vs n”, “1 vs 2”, “1 vs 3”, “1 vs n”, . . . “n-1 vs n”.

Paragraph found in https://scikit-learn.org/stable/modules/svm.html

why sklearn.svm.SVC's attribute coef_ has shape = [n_class * (n_class-1) / 2, n_features]?

1 Answers