I have the following piece of code:
from sklearn import model_selection
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
from sklearn.pipeline import Pipeline
...
x_train, x_test, y_train, y_test= model_selection.train_test_split(dataframe[features_],dataframe[labels], test_size=0.30,random_state=42, shuffle=True)
classifier = RandomForestClassifier(n_estimators=11)
pipe = Pipeline([('feats', feature), ('clf', classifier)])
pipe.fit(x_train, y_train)
predicts = pipe.predict(x_test)
Instead of train test split, I want to use k-fold cross validation to train my model. However, I do not know how can make it by using pipeline structure. I came across this: https://scikit-learn.org/stable/modules/compose.html but I could not fit to my code.
I want to use from sklearn.model_selection import StratifiedKFold
if possible. I can use it without pipeline structure but I can not use it with pipeline.
Update: I tried this but it generates me error.
x_train = dataframe[features_]
y_train = dataframe[labels]
skf = StratifiedKFold(n_splits=3, shuffle=True, random_state=42)
classifier = RandomForestClassifier(n_estimators=11)
#pipe = Pipeline([('feats', feature), ('clf', classifier)])
#pipe.fit(x_train, y_train)
#predicts = pipe.predict(x_test)
predicts = cross_val_predict(classifier, x_train , y_train , cv=skf)