I'm working on a project in which I have combined 2 datasets if time series (e.g D1, D2). D1 was with the 5-minutes interval and D2 was for the 1-minute interval, so I transformed the D1 to 1-minute interval and combine with the D2. Now I want to split this new dataset D1D2 into train, test and valid sets on the base of these conditions:
Note: I have searched a lot and try to find a solution for my problem but couldn't any answer fit to my question, so don't mark this as duplicate, please!
- The valid set should be 60 values from the end of the dataset.
- Then, the test set should be the most recent values till to the
valid set - Then, I will have the train set with the remaining data.
Here's how I'm doing the split now:
def split_train_test(dataset, train_size, test_size):
train = dataset[:train_size, :]
test = dataset[test_size:, :]
# split into input and outputs
train_X, train_y = train[:, :-1], train[:, -1]
test_X, test_y = test[:, :-1], test[:, -1]
# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))
print(train_X.shape, train_y.shape, test_X.shape)
return train, test, train_X, train_y, test_X, test_y
But now I need to convert into train, test and split on the base of the above conditions?
How can I do that? and also is it the right way to split time-series datasets?
train_df = df[:-60, :]- pissallvalid setbut how can I can split the remaining records totrain and test? - Abdul RehmanThen, the test set should be the most recent values till to the valid setmean? - pissallvalid setthat's I mean we have to take the recent values as thetest setby leaving the last 60 records of the dataset. - Abdul Rehman