Manual k-fold cross validation for Random Forest

Question

I am using a Random Forest Classifier and I want to perform k-fold cross validation. My dataset is already split in 10 different subsets, so I'd like to use them to do k-fold cross validation, without using automatic functions that randomly split the dataset. Is it possible in Python?

Random Forest doesn't have the partial_fit() method, so I can't do an incremental fit.

Yes, it's possible. However, there is infinite number of ways to do that, which makes answering this question impossible, within reason. — BartoszKP
What does partial_fit() or other splitting functions have to do here? Do you have any specific difficulty in running a for loop, and in each iteration fitting to the (concatenated) 9 subsets while testing in the remaining one? If yes, please post what you have tried so far and the specific issues encountered. Otherwise, as @BartoszKP has already noticed, the answer to your question is simply "yes, it is possible" (and it has nothing to do with Random Forest in particular, or any other specific algorithm whatsoever). — desertnaut

Golden Lion Golden Lion · Accepted Answer · 2021-03-08T17:56:19

try kf = StratifiedKFold(n_splits=3, shuffle=True, random_state=123) to evenly split your data

try kf=TimeSeriesSpit(n_splits=5) to split by time stamp try kf = KFold(n_splits=5, random_state=123, shuffle=True) to shuffle your training data before splitting.

for train_index, test_index in kf.split(bryant_shots):
     cv_train, cv_test = df.iloc[train_index], df.iloc[test_index]

     #fit the classifier

you can also stratefy by groupings or categories and get mean averages for these groupings using kfold. It is super powerful for understanding your data.

Manual k-fold cross validation for Random Forest

2 Answers