2
votes

I have done multi-class classification with scikit. But I want independent prediction of each class results, instead of summing them into 1.

I know, its as similar to Multi-label. But have to generate independent 0-1 value for each classes in the predicted output.

clf = OneVsRestClassifier(SGDClassifier(alpha=0.001, loss="log", random_state=42,
                                            max_iter=100, shuffle=True, verbose=1))


Output:
[0.04188954 0.01330129 0.01330501 0.02050405 0.03726504 0.01412006
 0.01753864 0.01250115 0.02342872 0.0124999  0.05234852 0.0161394
 0.01250032 0.01330749 0.01403075 0.0149792  0.0125048  0.01250406
 0.01412335 0.01413113 0.01412246 0.06543099 0.01249486 0.01250054
 0.01308784 0.01330463 0.01250242 0.02252353 0.02037271 0.0133038
 0.01250215 0.0125009  0.01537566 0.02023355 0.01600915 0.01762224
 0.01496796 0.01496522 0.01412407 0.01250198 0.01239722 0.01249967
 0.01763284 0.01573462 0.01250276 0.01451515 0.01330437 0.01329294
 0.01249999 0.01485671 0.01249419 0.01858113 0.01250192 0.01585085
 0.01330439 0.01250573 0.01250585 0.01715666 0.01249392]

Summing this I got 1. But I want each of them to compare with 0-1 independently. How could its possible?

As per scikit notes, "In the single label multiclass case, the rows of the returned matrix sum to 1."

Ref: https://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsRestClassifier.html

How to override this?

Created 2d matrix:

The shape of matrix is (342, 2)

[[  4   0]
 [  4   0]
 [  4   0]
 [ 21   0]
 [ 21   0]]

Got error as:

ValueError: Multioutput target data is not supported with label binarization

Using label binarizer I got (349,59) There are 59 labels and 349 samples.

Using MultiOutputClassifier

clf = SGDClassifier(loss="log", random_state=42, verbose=0)
clf = MultiOutputClassifier(clf)

Result:

clf.predict_proba(x_test)

[array([[0.99310559, 0.00689441]]), array([[0.9942846, 0.0057154]]), array([[0.0051056, 0.9948944]])]

As per comment, https://scikit-learn.org/stable/modules/generated/sklearn.multioutput.MultiOutputClassifier.html#sklearn.multioutput.MultiOutputClassifier.predict_proba

Result is having 3 classes.

How do I interpret it into single value? Ex: array([[0.99310559, 0.00689441]]) => 0.5 or o.6

1
Are you fitting a vector of label or a matrix ? : This strategy can also be used for multilabel learning, where a classifier is used to predict multiple labels for instance, by fitting on a 2-d matrix in which cell [i, j] is 1 if sample i has label j and 0 otherwise .Born Tbe Wasted
@BornTbeWasted I used LabelBinarizer and converted the labels. [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] How do I convert this to [i, j] ?Giri Annamalai M
Create a 2-d matrix : for each sample, give them the sum of the converted labels : e.g [ 0 0 ... 1 ... 1 ...0], and then add that row to a "label" that you will give to the classifierBorn Tbe Wasted
@BornTbeWasted I have updated the question. created 2D array. But could not train it.Giri Annamalai M
Well then create a vector of size 59 , for each sample , representing the labels he has .Born Tbe Wasted

1 Answers

2
votes

If you want to solve the problem as multi-label problem, then use multiOutput wrapper instead of OneVsRestClassifier().

Here is an example:

from sklearn.multioutput import MultiOutputClassifier
from sklearn.linear_model import SGDClassifier
from sklearn.datasets import load_iris
from sklearn import preprocessing

X,y = load_iris(return_X_y=True)

clf = MultiOutputClassifier(SGDClassifier(loss='log',max_iter =10))
lb = preprocessing.LabelBinarizer()
y_onehot = lb.fit_transform(y)
clf.fit(X, y_onehot)

clf.predict_proba([X[0]]) 

output:

[array([[0., 1.]]),
 array([[1.00000000e+00, 5.63826474e-52]]),
 array([[1., 0.]])]

Second element is the probability of given record belonging to that class. Now, you can see the probability of all the classes does not sum to 1.