I'm using the RFECV module in sklearn to find the optimal number of features to yield the highest Cross validation on 2 folds. I am using a ridge regressor as my estimator.
rfecv = RFECV(estimator=ridge,step=1, cv=KFold(n_splits=2))
rfecv.fit(df, y)
I have 5 features in my dataset that I have standardized using the standardscaler.
I'll run the RFECV on my data, and it'll say that 2 features is optimal. But when I remove one of the features with the lowest regression coefficient and rerun the RFECV, it now says that 3 features is optimal.
When I progress through all features one at a time (as the recursive should do) I find that 3 is in fact the optimal.
I've tested this with other datasets, and have found that the optimal number of features changes as I remove features one at a time and rerun RFECV.
I might be missing something, but isn't that what RFECV is supposed to solve? Any additional insights on RFECV is appreciated.