I used recursive feature elimination and cross-validated (rfecv) in order to find the best accuracy score for several features I had (m =154).
rfecv = RFECV(estimator=logreg, step=1, cv=StratifiedKFold(2),
scoring='accuracy')
rfecv.fit(X, y)
The rankings (rfecv.ranking_
) and associated score(rfecv.grid_scores_
) are confusing to me. As you can see from the top 13 features (ranked in the top 10) their ranks are not base on score. I understand ranking has something to do with how and when the feature was excluded in the cross validation process. But then how is the score related to the ranking? I would expect the highest ranked features to have the highest scores.
Features/Ranking/Scores
b 1 0.692642743
a 1 0.606166207
f 1 0.568833672
i 1 0.54935204
l 2 0.607564808
j 3 0.613495238
e 4 0.626374391
l 5 0.581064621
d 6 0.611407556
c 7 0.570921354
h 8 0.570921354
k 9 0.576863707
g 10 0.576863707
grid_scores_
andranking_
. They have no correlation whatsoever. Thegrid_scores_
are not scores of features. – Vivek Kumargrid_scores_
is same asranking_
. Changestep
to more than 1, to see the effect. – Vivek Kumar