I am trying to fit a multinomial logistic regression and then predicting the result from samples.
### RZS_TC is my dataframe
RZS_TC.loc[RZS_TC['Mean_Treecover'] <= 50, 'Mean_Treecover' ] = 0
RZS_TC.loc[RZS_TC['Mean_Treecover'] > 50, 'Mean_Treecover' ] = 1
RZS_TC[['MAP']+['Sr']+['delTC']+['Mean_Treecover']].head()
[Output]:
MAP Sr delTC Mean_Treecover
302993741 2159.297363 452.975647 2.666672 1.0
217364332 3242.351807 65.615341 8.000000 1.0
390863334 1617.215454 493.124054 5.666666 0.0
446559668 1095.183105 498.373383 -8.000000 0.0
246078364 2804.615234 98.981110 -4.000000 1.0
1000000 rows × 7 columns
#Fitting a logistic regression
from statsmodels.formula.api import mnlogit
model = mnlogit("Mean_Treecover ~ MAP + Sr + delTC", RZS_TC).fit()
print(model.summary2())
[Output]:
Results: MNLogit
====================================================================
Model: MNLogit Pseudo R-squared: 0.364
Dependent Variable: Mean_Treecover AIC: 831092.4595
Date: 2021-04-02 13:51 BIC: 831139.7215
No. Observations: 1000000 Log-Likelihood: -4.1554e+05
Df Model: 3 LL-Null: -6.5347e+05
Df Residuals: 999996 LLR p-value: 0.0000
Converged: 1.0000 Scale: 1.0000
No. Iterations: 7.0000
--------------------------------------------------------------------
Mean_Treecover = 0 Coef. Std.Err. t P>|t| [0.025 0.975]
--------------------------------------------------------------------
Intercept -5.2200 0.0119 -438.4468 0.0000 -5.2434 -5.1967
MAP 0.0023 0.0000 491.0859 0.0000 0.0023 0.0023
Sr 0.0016 0.0000 90.6805 0.0000 0.0015 0.0016
delTC -0.0093 0.0002 -39.9022 0.0000 -0.0098 -0.0089
However, wherever I try to predict the using the model.predict()
function, I get the following error.
prediction = model.predict(np.array(RZS_TC[['MAP']+['Sr']+['delTC']]))
[Output]: ERROR! Session/line number was not unique in database. History logging moved to new session 2627
Does anyone know how to troubleshoot this? Is there something that I might be doing wrong?