2
votes

I am using logisitic regression in SKLearn to classify data into one of 5 classes. To train the data I have a matrix of observations Y and a matrix of features X.

Sometimes it is the case that my matrix Y will have no category 3 say. In this case when I call the predict_proba(X) method I would like to have a list of 5 probabilities where the 3rd entry is 0 (as there are no category 3 observations). Instead this probability is simply omitted and a list of 4 probabilities is returned.

How can I change the logistic regression object to do this?

2

2 Answers

3
votes

LogisticRegression doesn't allow this, but its close cousin SGDClassifier does:

logreg = SGDClassifier(loss="log")
logreg.partial_fit(X, y, classes=np.arange(5))

SGDClassifier differs in its training algorithm and parametrization. If that's not ok, then you'll have to roll your own wrapper code.

1
votes

A multi-class label can be found using the sklearn.preprocessing module.

Reference: http://scikit-learn.org/stable/modules/preprocessing.html#label-binarization