3
votes

I'm working on a multiclass classification problem with different classifiers, working with Python and scikit-learn. I want to use the predicted probabilities, basically to compare the predicted probabilities of the different classifiers for a specific case.

I started reading about "calibration", for example at scikit-learn and a publication, and I became confused.

For what I understood: a well-calibrated probability means that that a probability also reflects the fraction of a certain class.

  1. Does this imply that if I have 10 equally distributed classes, the calibrated probabilities would ideally be around 0.1 for every class?

  2. Can I interpret the probabilities of predict_proba (without calibration) as "how certain is the classifier about this being the correct class"?

Hopefully, someone can clarify this for me! :)

1

1 Answers

0
votes

I understand that you are having a multiclass classification problem using this definition. "All classifiers in scikit-learn do multiclass classification out-of-the-box."

In this case, as mentioned,

CalibratedClassifierCV can calibrate probabilities in a multiclass setting if the base estimator supports multiclass predictions. [Which is always the case.] The classifier is calibrated first for each class separately in a one-vs-rest fashion. When predicting probabilities, the calibrated probabilities for each class are predicted separately. As those probabilities do not necessarily sum to one, a postprocessing is performed to normalize them.

I hope this answers your first question.

To answer your second question: Yes, this is the idea, before and after calibration for predict_proba. However, after calibration the results of predict_proba are actually right, while before they are just so-so correct.


Afterthough:

To be precise, I did not try to answer your first question at face value. There you asked regarding probability for each class. However, since we are talking about calibration, you have to consider that predict_proba is giving an output per sample, not per class. I think you mean per sample, otherwise you should specify: Do you mean the average probability over all samples?