I have a pandas
dataframe indexed by date. Let's assume it from Jan-1 to Jan-30. I want to split this dataset into X_train, X_test, y_train, y_test but I don't want to mix the dates so I want the train and test samples to be divided by a certain date (or index). I'm trying
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
But when I check the values, I see the dates are mixed. I want to split my data as:
Jan-1 to Jan-24
to train and Jan-25 to Jan-30
to test (as test_size is 0.2, that makes 24 to train and 6 to test)
How can I do this? Thanks
x.head(24)
and for last 6 usex.tail(6)
no need fortrain_test split
– Nihalrandom_state=None
will takenumpy.random
that's why it won't work – Nihal