I am trying to work out how to use PCA to determine the most important features. I think I have done that below.
I am wondering then, how do I pass the most important features, with their original column names (from a pandas dataframe) back into the new dataframe I am creating at the bottom - so I can use that as the new 'lightweight' dataset?
This way, if I set n_components to 10; I would have 10 feature columns (with names) being passed into the new dataframe.
Any ideas?
from sklearn.decomposition import PCA
# PCA (principal component analysis) aims to reduce the number of dimensions in the dataset, without losing those which are very relevant to the model
# it provides a score, you can drop those with poor scores.
X_pc = PCA(n_components=2).fit_transform(train_features)
pd.DataFrame({'PC1': X_pc[:, 0], 'PC2': X_pc[:, 1], 'Y': train_labels.ravel()}).sample(10)