2
votes

I used recursive feature elimination and cross-validated (rfecv) in order to find the best accuracy score for several features I had (m =154).

rfecv = RFECV(estimator=logreg, step=1, cv=StratifiedKFold(2),
              scoring='accuracy')
rfecv.fit(X, y)

The rankings (rfecv.ranking_) and associated score(rfecv.grid_scores_) are confusing to me. As you can see from the top 13 features (ranked in the top 10) their ranks are not base on score. I understand ranking has something to do with how and when the feature was excluded in the cross validation process. But then how is the score related to the ranking? I would expect the highest ranked features to have the highest scores.

Features/Ranking/Scores
b       1       0.692642743
a       1       0.606166207
f       1       0.568833672
i       1       0.54935204
l       2       0.607564808
j       3       0.613495238
e       4       0.626374391
l       5       0.581064621
d       6       0.611407556
c       7       0.570921354
h       8       0.570921354
k       9       0.576863707
g       10      0.576863707
1
You are misinterpreting grid_scores_ and ranking_. They have no correlation whatsoever. The grid_scores_ are not scores of features.Vivek Kumar
What are they scores of? Can you elaborate? What are the rankings of, then? Thanks for taking the time.Liam Hanninen
From the documentation : "The cross-validation scores such that grid_scores_[i] corresponds to the CV score of the i-th subset of features." They do not represent score of features. They represent score of estimator when certain features are selected.Vivek Kumar
You are confused because the size of grid_scores_ is same as ranking_. Change step to more than 1, to see the effect.Vivek Kumar

1 Answers

5
votes

_grid_scores is not a score for the i-th feature, it is the score the estimator produced when trained with the i-th subset of features.

To understand what that means, remember that Recursive Feature Elimination (RFE) works by training the model, evaluating it, then removing the step least significant features, and repeating.

So, _grid_score[-1] will be the score of the estimator trained on all features. _grid_score[-2] will be the score of the estimator with step features removed. _grid_score[-3] will be the score of the estimator with 2*step features removed.

As such, the grid scores do not reflect the scoring of individual features. In fact, if step is greater than 1, there will be fewer grid scores than features.