I created a table to test my understanding
F1 F2 Outcome
0 2 5 1
1 4 8 2
2 6 0 3
3 9 8 4
4 10 6 5
From F1 and F2 I tried to predict Outcome
As you can see F1 have a strong correlation to Outcome,F2 is random noise
I tested
pca = PCA(n_components=2)
fit = pca.fit(X)
print("Explained Variance")
print(fit.explained_variance_ratio_)
Explained Variance
[ 0.57554896 0.42445104]
Which is what I expected and shows that F1 is more important
However when I do RFE (Recursive Feature Elimination)
model = LogisticRegression()
rfe = RFE(model, 1)
fit = rfe.fit(X, Y)
print(fit.n_features_)
print(fit.support_)
print(fit.ranking_)
1
[False True]
[2 1]
It asked me to keep F2 instead? It should ask me to keep F1 since F1 is a strong predictor while F2 is random noise... why F2?
Thanks