8
votes

According to the documentation it is possible to specify different loss functions to SGDClassifier. And as far as I understand log loss is a cross-entropy loss function which theoretically can handle soft labels, i.e. labels given as some probabilities [0,1].

The question is: is it possible to use SGDClassifier with log loss function out the box for classification problems with soft labels? And if not - how this task (linear classification on soft labels) can be solved using scikit-learn?

UPDATE:

The way target is labeled and by the nature of the problem hard labels don't give good results. But it is still a classification problem (not regression) and I wan't to keep probabilistic interpretation of the prediction so regression doesn't work out of the box too. Cross-entropy loss function can handle soft labels in target naturally. It seems that all loss functions for linear classifiers in scikit-learn can only handle hard labels.

So the question is probably:

How to specify my own loss function for SGDClassifier, for example. It seems scikit-learn doesn't stick to the modular approach here and changes need to be done somewhere inside it's sources

2

2 Answers

9
votes

I recently had this problem and came up with a nice fix that seems to work.

Basically, transform your targets to log-odds-ratio space using the inverse sigmoid function. Then fit a linear regression. Then, to do inference, take the sigmoid of the predictions from the linear regression model.

So say we have soft targets/labels y ∈ (0, 1) (make sure to clamp the targets to say [1e-8, 1 - 1e-8] to avoid instability issues when we take logs).

We take the inverse sigmoid, then we fit a linear regression (assuming predictor variables are in matrix X):

y = np.clip(y, 1e-8, 1 - 1e-8)   # numerical stability
inv_sig_y = np.log(y / (1 - y))  # transform to log-odds-ratio space

from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(X, inv_sig_y)

Then to make predictions:

def sigmoid(x):
    ex = np.exp(x)
    return ex / (1 + ex)

preds = sigmoid(lr.predict(X_new))

This seems to work, at least for my use case. My guess is that it's not far off what happens behind the scenes for LogisticRegression anyway.

Bonus: this also seems to work with other regression models in sklearn, e.g. RandomForestRegressor.

3
votes

According to the docs,

The ‘log’ loss gives logistic regression, a probabilistic classifier.

In general a loss function is of the form Loss( prediction, target ), where prediction is the model's output, and target is the ground-truth value. In the case of logistic regression, prediction is a value on (0,1) (i.e., a "soft label"), while target is 0 or 1 (i.e., a "hard label").

So in answer to your question, it depends on if you are referring to the prediction or target. Generally speaking, the form of the labels ("hard" or "soft") is given by the algorithm chosen for prediction and by the data on hand for target.

If your data has "hard" labels, and you desire a "soft" label output by your model (which can be thresholded to give a "hard" label), then yes, logistic regression is in this category.

If your data has "soft" labels, then you would have to choose a threshold to convert them to "hard" labels before using typical classification methods (i.e., logistic regression). Otherwise, you could use a regression method where the model is fit to predict the "soft" target. In this latter approach, your model could give values outside of (0,1), and this would have to be handled.