I am using Timeseriessplit function from sklearn, to create train and test sets for the cross-validation of a timeseries. The idea is for instance to use the n-1 datapoints for training, and the n-th datapoint for testing. This split must be always ordered, as it is a timeseries. However, I don't understand, why the dataset X in the example is formatted as follows:
from sklearn.model_selection import TimeSeriesSplit
import numpy as np
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
y = np.array([1, 2, 3, 4])
tscv = TimeSeriesSplit(n_splits=3)
print(tscv)
for train_index, test_index in tscv.split(X):
print("TRAIN:", train_index, "TEST:", test_index)
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
what is the logic behind a preperation of the data as X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])? And of course I read the notes on the page, but still not understanding