I am attempting to perform a logistic regression on a dataset which contains a target variable which is boolean ('default'), and two features ('fico_interp', 'home_ownership_int') using logit module in statsmodels. All three values are from the same data frame, 'traindf':
from sklearn import datasets
import statsmodels.formula.api as smf
lmf = smf.logit('default ~ fico_interp + home_ownership_int',traindf).fit()
Which generates an error message:
ValueError: operands could not be broadcast together with shapes (40406,2) (40406,)
How can this happen?
fico_interp
orhome_ownership_int
is a (x,2) array. try to visualize them – farhawaint
. patsy treats the boolean as categorical variable and converts it to a 2 dimensional response variable which doesn't work for Logit. There should be already an open issue for this in statsmodels, but there is no solution yet. – Josef