I have a strange issue already mentioned here: LinearSVC Feature Selection returns different coef_ in Python
but I cannot really relate to that.
I have a Regularised L1 logistic regression that I am using for feature selection. When I simply rerun the code the number of the feature selected changes. The target variable is binary 1, 0. The number of feature is 709. The training observations are 435, so the feature are more than the observations. The penalty C has been obtained through TimeSeriesSplit CV and never changes when I rerun, I verified that.
Below the code for the feature selection part..
X=df_training_features
y=df_training_targets
lr_l1 = LogisticRegression(C = LR_penalty.C, max_iter=10000,class_weight=None, dual=False,
fit_intercept=True, intercept_scaling=1, l1_ratio=None, n_jobs=None,
penalty='l1', random_state=None, solver='liblinear', tol=0.0001, verbose=0,
warm_start=False).fit(X,y)
model = SelectFromModel(lr_l1, threshold=1e-5, prefit=True)
feature_idx = model.get_support()
feature_name = X.columns[feature_idx]
X_new = model.transform(X)
# Plot
importance = lr_l1.coef_[0]
for i,v in enumerate(importance):
if np.abs(v)>=1e-5:
print('Feature: %0d, Score: %.5f' % (i,v))
sel = importance[np.abs(importance)>=1e-5]
# plot feature importance
plt.figure(figsize=(12, 10))
pyplot.bar([x for x in feature_name], sel)
pyplot.xticks(fontsize=10, rotation=70)
pyplot.ylabel('Feature Importance', fontsize = 14)
pyplot.show()
As seen above, the result sometimes gives me 22 feature selected (first plot), and some other times 24 (second plot), or 23. Not sure what is happening. I thought the issue was in the SelectFromModel so I decided to explicitly state the threshold 1e-5 (which is the default for l1 regularisation), but nothing changes.
It is always the same features which are sometimes in and sometimes out so I checked their coefficients as I thought they might be close to that threshold instead they are not (1 or 2 order of magnitude higher).
Can please anybody help? I have been struggling more than a day on this
random_state=42
– Sergey Bushmanov