scikit-learn classification on soft labels

Question

According to the documentation it is possible to specify different loss functions to SGDClassifier. And as far as I understand log loss is a cross-entropy loss function which theoretically can handle soft labels, i.e. labels given as some probabilities [0,1].

The question is: is it possible to use SGDClassifier with log loss function out the box for classification problems with soft labels? And if not - how this task (linear classification on soft labels) can be solved using scikit-learn?

UPDATE:

The way target is labeled and by the nature of the problem hard labels don't give good results. But it is still a classification problem (not regression) and I wan't to keep probabilistic interpretation of the prediction so regression doesn't work out of the box too. Cross-entropy loss function can handle soft labels in target naturally. It seems that all loss functions for linear classifiers in scikit-learn can only handle hard labels.

So the question is probably:

How to specify my own loss function for SGDClassifier, for example. It seems scikit-learn doesn't stick to the modular approach here and changes need to be done somewhere inside it's sources

nlml nlml · Accepted Answer · 2020-04-01T11:34:21

I recently had this problem and came up with a nice fix that seems to work.

Basically, transform your targets to log-odds-ratio space using the inverse sigmoid function. Then fit a linear regression. Then, to do inference, take the sigmoid of the predictions from the linear regression model.

So say we have soft targets/labels y ∈ (0, 1) (make sure to clamp the targets to say [1e-8, 1 - 1e-8] to avoid instability issues when we take logs).

We take the inverse sigmoid, then we fit a linear regression (assuming predictor variables are in matrix X):

y = np.clip(y, 1e-8, 1 - 1e-8)   # numerical stability
inv_sig_y = np.log(y / (1 - y))  # transform to log-odds-ratio space

from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(X, inv_sig_y)

Then to make predictions:

def sigmoid(x):
    ex = np.exp(x)
    return ex / (1 + ex)

preds = sigmoid(lr.predict(X_new))

This seems to work, at least for my use case. My guess is that it's not far off what happens behind the scenes for LogisticRegression anyway.

Bonus: this also seems to work with other regression models in sklearn, e.g. RandomForestRegressor.

scikit-learn classification on soft labels

2 Answers