Python SKLearn: Logistic Regression Probabilities

Question

I am using the Python SKLearn module to perform logistic regression. I have a dependent variable vector Y (taking values from 1 of M classes) and independent variable matrix X (with N features). My code is

        LR = LogisticRegression()
        LR.fit(X,np.resize(Y,(len(Y))))

My question is, what does LR.coef_ and LR.intercept_ represent. I initially thought they held the values intercept(i) and coef(i,j) s.t.

log(p(1)/(1-p(1))) = intercept(1) + coef(1,1)*X1 + ... coef(1,N)*XN
.
.
.
log(p(M)/(1-p(M))) = intercept(M) + coef(M,1)*X1 + ... coef(M,N)*XN

where p(i) is the probability that observation with features [X1, ... ,XN] is in class i. However when I try to convert

V = X*LR.coef_.transpose()
U = V + LR.intercept_
A = np.exp(U)
A/(1+A)

so that A is the matrix of p(1) ... p(M) for the observations in X. This should be the same value as

LR.predict_proba(X)

however they are close, but different. Why is this?

If you let P = A/(1+A), is P / P.sum(axis=1).reshape((-1, 1)) exactly the same as predict_proba's output? — Fred Foo
Bingo, didn't think to test the probability values added to 1... Thanks, if you post this as an answer I will accept. — rwolst
Doesn't logistic regression estimate just one set of (N+1) coefficients? i.e. why do you have "coef(1,1)" and "coef(M,1)" if they're the same thing? Am I missing something? — DrMisha
To be more precise I could have said "Multinomial Logistic Regression". The key point is that there are multiple possible classes or observation can be in. — rwolst

Fred Foo Fred Foo · Accepted Answer · 2013-12-07T15:36:29

The coef_ and intercept_ attributes represent what you think, your probability calculations are off because you forgot to normalize: after

P = A / (1 + A)

you should do

P /= P.sum(axis=1).reshape((-1, 1))

to reproduce the scikit-learn algorithm.

Python SKLearn: Logistic Regression Probabilities

1 Answers