sklearn Logistic Regression probability

Question

I have a dataset that determines whether a student will be admitted given two scores. I train my model with this data and can determine if a student will be admitted or not using the following code:

model.predict([score1, score2])

This results in the answer:

[1]

How can I get the probability of that? If I try predict_proba, I get:

model.predict_proba([score1, score2])
>>[[ 0.38537034  0.61462966]]

I'd really like to see something like:

>> [0.75]

to indicate that P(admittance | score1, score2) = 0.75

Raff.Edward Raff.Edward · Accepted Answer · 2015-03-04T05:24:31

You may notice that 0.38537034+ 0.61462966 = 1. This is because you are getting the probabilities for both classes (admitted and not admitted) from the output of predict_proba. If you had 7 classes, you would instead get something like [[p1, p2, p3, p4, p5, p6, p7]] where p1+p2+p3+p4+p5+p6+p7 = 1 and pi >= 0. So if you want the probability of output i, you go index into your result and look at what pi is. Thats just how that works.

So if you had something where the probability was 0.75 of being not admitted, you would get a result that looks like [[0.25, 0.75]].

(I may have reversed the ordering you used in your code for admitted/not admitted, but it doesn't matter - that just changes the index you look at).

sklearn Logistic Regression probability

2 Answers