Training subset in Kfold sklearn

Question

Is there a way to train a model using the train subset in 8 of the 10 Kfolds that kf = KFold(n_splits=10) that sklearn has implemented?.

I want to split my data into three subsets: training, validation, and testing (this can be done by using train_test_split twice I think...).

The training set is used to fit the model, the validation set is used to tune the parameters, the test set is used for assessment of the generalization error of the final model.

But I was wondering if there is a way to just train with 8 of the 10 folds and get an error/accuracy, validate it on 1 fold and finally test it in the last fold getting errors/accuracy for them too.

See below for my thinking:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=1)
clf = tree.DecisionTreeClassifier(criterion = "entropy", max_depth = 3)
kf = KFold(n_splits=10, shuffle = False, random_state = 0) #define number of splits
kf.get_n_splits(X) #to check how many splits will be done.
for train, test in kf.split(X_train, y_train):

what exactly are you meaning with 'tune the parameters'? Do you mean hyperparameters of your classifier? — pythonic833

mpour mpour · Accepted Answer · 2018-03-29T04:38:22

From your question, what I understood is that you want to leave out one or more of your subsets. In that case, you can leave one or more subsets of data using Leave One Out (LOO) or Leave P Out (LPO).

Training subset in Kfold sklearn

2 Answers