I am using SKlearn KFold as follows:
kf = KFold(10000, n_folds=5, shuffle=True, random_state=88)
However, I want to exclude certain indices from the training folds (only). How can this be achieved? Thanks.
I wonder if this can be achieved by using sklearn.cross_validation.PredefinedSplit?
Update: The KFold instance will be used with XGBoost for the folds parameter of xgb.cv. The Python API here states that folds should be "a KFold or StratifiedKFold instance".
However, I will try generating the KFolds as above, iterating over the train fold indices, modifying them, and then defining a custom_cv by hand like this:
custom_cv = zip(train_indices, test_indices)
kf_list = list(kf)
it will return a list of tuples which is will be iterable in the same way as the KFold object and you can remove the indices you want from the tuples in the list. – juanpa.arrivillaga