According to the documentation it is possible to specify different loss functions to SGDClassifier
. And as far as I understand log loss
is a cross-entropy
loss function which theoretically can handle soft labels, i.e. labels given as some probabilities [0,1].
The question is: is it possible to use SGDClassifier
with log loss
function out the box for classification problems with soft labels? And if not - how this task (linear classification on soft labels) can be solved using scikit-learn?
UPDATE:
The way target
is labeled and by the nature of the problem hard labels don't give good results. But it is still a classification problem (not regression) and I wan't to keep probabilistic interpretation of the prediction
so regression doesn't work out of the box too. Cross-entropy loss function can handle soft labels in target
naturally. It seems that all loss functions for linear classifiers in scikit-learn can only handle hard labels.
So the question is probably:
How to specify my own loss function for SGDClassifier
, for example. It seems scikit-learn
doesn't stick to the modular approach here and changes need to be done somewhere inside it's sources