0
votes

I was learning logistic regression in python by comparing it with SAS.

Dataset: http://www.ats.ucla.edu/stat/data/binary.csv

here admit is the response variable and has categories 0 and 1.

SAS by default is modeling based on probability that ADMIT=0 and if I specify DESC option it does it on ADMIT = 1.

Ref: http://www.ats.ucla.edu/stat/sas/faq/logistic_descending.htm

Now in python,using stats models by default it is modeling on ADMIT = 1.How can I make to model on ADMIT= 0 (change the event description ) so that I dont see the difference in coefficients and predicted probablities.

Thanks.

1
interchanging the event 0 as 1 and 1 as 0 doesn't seem to be a good approach.marupav

1 Answers

1
votes

The only robust way is to create a new 0-1 dummy variable with 1 representing the desired level.

for example:

not_admit = (ADMIT == 0).astype(int)

"robust" here refers to current ambiguities in the interaction between pandas, patsy and statsmodels which might change a categorical variable if the dtype is not integer or float, e.g. string, boolean or object. This treatment of categorical dependent variables will have to change at some point in a backwards incompatible way to make it consistent between formula and non-formula versions.

There are some issues about this, for example https://github.com/statsmodels/statsmodels/issues/2733