1
votes

I have a dataset that determines whether a student will be admitted given two scores. I train my model with this data and can determine if a student will be admitted or not using the following code:

model.predict([score1, score2])

This results in the answer:

[1]

How can I get the probability of that? If I try predict_proba, I get:

model.predict_proba([score1, score2])
>>[[ 0.38537034  0.61462966]]

I'd really like to see something like:

>> [0.75]

to indicate that P(admittance | score1, score2) = 0.75

2

2 Answers

2
votes

You may notice that 0.38537034+ 0.61462966 = 1. This is because you are getting the probabilities for both classes (admitted and not admitted) from the output of predict_proba. If you had 7 classes, you would instead get something like [[p1, p2, p3, p4, p5, p6, p7]] where p1+p2+p3+p4+p5+p6+p7 = 1 and pi >= 0. So if you want the probability of output i, you go index into your result and look at what pi is. Thats just how that works.

So if you had something where the probability was 0.75 of being not admitted, you would get a result that looks like [[0.25, 0.75]].

(I may have reversed the ordering you used in your code for admitted/not admitted, but it doesn't matter - that just changes the index you look at).

1
votes

If you want to sklearn's Lr model and you want to get the 2 classes' predicted probability, you should use this:

model.predict_proba(xtest)

You will get the array of two classes prob(shape N*2).