Custom folds for cross-validation in scikit-learn

Question

I would like to use GridSearchCV (with n_jobs > 1) for a particular classifier, but I have information about the folds for 10-fold cross-validation from another source. Is there some way to input data already divided into folds instead of using the folds created by GridSearchCV.

Thanks!

ogrisel ogrisel · Accepted Answer · 2013-08-16T16:27:05

You can create a custom CV iterator, for instance by taking inspiration on LeaveOneGroupOut or LeaveOneGroupOut to implement the structure you are interested in.

Alternatively you can prepare your own precomputed folds encoded as an array of integers (representing sample indices between 0 and n_samples - 1) and then pass that CV iterator as the cv argument of the cross_val_score and GridSearchCV utilities:

>>> X, y = make_classification(n_samples=10)
>>> import numpy as np
>>> from sklearn.datasets import make_classification
>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.model_selection import cross_val_score
>>> cv_splits = [
...     (np.array([0, 1, 2, 3]), np.array([4, 5, 6])),
...     (np.array([1, 2, 3, 4]), np.array([5, 6, 7])),
...     (np.array([5, 6, 8, 9]), np.array([1, 2, 3, 4])),
... ]
>>> cross_val_score(LogisticRegression(), X, y, cv=cv_splits)
array([1.        , 0.33333333, 0.75      ])

Custom folds for cross-validation in scikit-learn

1 Answers