I'm working on a multiclass classification problem with different classifiers, working with Python and scikit-learn. I want to use the predicted probabilities, basically to compare the predicted probabilities of the different classifiers for a specific case.
I started reading about "calibration", for example at scikit-learn and a publication, and I became confused.
For what I understood: a well-calibrated probability means that that a probability also reflects the fraction of a certain class.
Does this imply that if I have 10 equally distributed classes, the calibrated probabilities would ideally be around 0.1 for every class?
Can I interpret the probabilities of
predict_proba
(without calibration) as "how certain is the classifier about this being the correct class"?
Hopefully, someone can clarify this for me! :)