I am trying to generate random forest's feature importance plot using cross validation folds. When only feature (X) and target(y) data is used, the implementation is straightforward such as:
rfc = RandomForestClassifier()
rfc.fit(X, y)
importances = pd.DataFrame({'FEATURE':data_x.columns,'IMPORTANCE':np.round(rfc.feature_importances_,3)})
importances = importances.sort_values('IMPORTANCE',ascending=False).set_index('FEATURE')
print(importances)
importances.plot.bar()
plt.show()
However, how could I transform this code in order to create a similar plot for every cross-validation folds (k-fold) that I would be creating ?
The code that I have at the moment is:
# Empty list storage to collect all results for displaying as plots
mylist = []
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import KFold
kf = KFold(n_splits=3)
for train, test in kf.split(X, y):
train_data = np.array(X)[train]
test_data = np.array(y)[test]
for rfc = RandomForestClassifier():
rfc.fit(train_data, test_data)
For example, the above code creates (3 folds) using the Cross validation technique and my aim is to create feature importance plots for all the 3 folds, resulting in 3 feature importance plot graphs. At the moment, it's giving me loop errors.
I am not sure what would be the most efficient technique to use each of the created (k-folds) to generate feature importance graph via Random forest respectively, for each of the (k-folds).

