I have a 92k observation dataset and am trying to fit a logistic regression model using sklearn LogisticRegression(), however it performs poorly near the baseline auc score: .51. Weirdly, logistic regression with statsmodels Logit() method achieves an auc score of .68. Both use regularization and the two predictors are numerical, with a binary output. I got sklearn and statsmodels to closely match performance metrics and coefficients before but am struggling to figure out why sklearn doesn't perform now.
I have tried running multiple times and restarting, same result. This is a single jupyter lab code block. How do I fix sklearn to match the performance of my statsmodels model?
Sklearn Model:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
X_train, X_test, y_train, y_test = train_test_split(df_model.drop("6MonthOutcome", axis=1), df_model['6MonthOutcome'], test_size=.2)
logit_model = LogisticRegression(max_iter=1000)
result = logit_model.fit(X_train, y_train)
y_pred = result.predict(X_test)
from sklearn.metrics import (confusion_matrix, accuracy_score)
from sklearn import metrics
fpr, tpr, thresholds = metrics.roc_curve(y_test, y_pred, pos_label=1)
print(metrics.auc(fpr, tpr))
>>> Output: 0.5050369815416016
Statsmodel Model
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
X_train, X_test, y_train, y_test = train_test_split(df_model.drop("6MonthOutcome", axis=1), df_model['6MonthOutcome'], test_size=.2)
X_train = sm.add_constant(X_train)
X_test = sm.add_constant(X_test)
logit_model = sm.Logit(y_train, X_train, maxiter=1000)
result = logit_model.fit_regularized()
y_pred = result.predict(X_test)
from sklearn.metrics import (confusion_matrix, accuracy_score)
from sklearn import metrics
fpr, tpr, thresholds = metrics.roc_curve(y_test, y_pred, pos_label=1)
print(metrics.auc(fpr, tpr))
>>> Output: 0.6813991995101205