How to use TimeSeriesSplit in cv as mentioned in the documentation of scikit-learn

Question

tss = TimeSeriesSplit(max_train_size=None, n_splits=10)
l =[]
neighb = [1,3,5,7,9,11,13,12,23,19,18]
for k in neighb:
    knn = KNeighborsClassifier(n_neighbors=k, algorithm='brute')
    sc = cross_val_score(knn, X1, y1, cv=tss, scoring='accuracy')
    l.append(sc.mean())

Trying to use 10 fold TimeSeries Split, but in the documentation of cross_val_score, it is given that we need to pass a cross-validation generator or an iterable. How should I pass it after time series split into train and test data to cv

TypeError
Traceback (most recent call last) in ()

 14 for k in neighb:

 15     knn = KNeighborsClassifier(n_neighbors=k, algorithm='brute')

---> 16 sc = cross_val_score(knn, X1, y1, cv=tss, scoring='accuracy')

 17     l.append(sc.mean())

 18 ~\Anaconda3\lib\site-packages\sklearn\cross_validation.py in cross_val_score(estimator, X, y, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch)

1579 train, test, verbose, None, 1580 fit_params)

-> 1581 for train, test in cv)

1582 return np.array(scores)[:, 0]

1583

TypeError: 'TimeSeriesSplit' object is not iterable

Trying to use 10 fold TimeSeries Split, but in the documentation of cross_val_score, it is given that we need to pass a cross-validation generator or an iterable. How should I pass it after time series split into train and test data to cv — Dhruv Bhardwaj
I have updated the answer, see the Edit2. Thats why I was asking for full code. — Vivek Kumar

Vivek Kumar Vivek Kumar · Accepted Answer · 2018-05-11T05:50:32

Just pass tss to cv.

scores = cross_val_score(knn, X_train, y_train, cv=tss , scoring='accuracy')

No need to call tss.split().

Update: The above method is tested on scikit-learn v0.19.1 . So make sure you have the latest version. Also I am using TimeSeriesSplit from model_selection module.

Edit 1:

You are using this now:

tss = TimeSeriesSplit(n_splits=10).split(X_1)
kn = KNeighborsClassifier(n_neighbors=5, algorithm='brute') 
sc = cross_val_score(kn, X1, y1, cv=tss, scoring='accuracy')

But in the question you posted you did this:

tss = TimeSeriesSplit(n_splits=10)

See the difference between them (split() is not present). I am using this tss in the cross_val_score() without the split() as you posted in the question.

Edit 2:

Dude you are using the deprecated class. Currently you are doing this:

from sklearn.cross_validation import cross_val_score

This is wrong. You should get a warning like this:

DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.

Pay attention to that, and use the model_selection module like this:

from sklearn.model_selection import cross_val_score

Then you will not get error with my code.

How to use TimeSeriesSplit in cv as mentioned in the documentation of scikit-learn

1 Answers