I'm running a GridSearchCV (Grid Search Cross Validation) from the Sklearn Library on a RandomForestClassifier. I want to use a train/validation/test split that I already obtained and previously saved on disk on separate numpy arrays (i.e., for compatibility on some tests I'll have to do with other algorithms on the same split).
I can't find a way to use my separate Validation Set with GridSearchCV. The only workaround I found is to use PredefinedSplit on a new numpy array where I concatenate the original train and validation arrays I loaded from saved files.
ts_train = extractPlainTable(np.load('TimeSeries/train_x%d_30.npy' % i),s)
ts_val = extractPlainTable(np.load('TimeSeries/validation_x%d_20.npy' % i),s)
ts_test = extractPlainTable(np.load('TimeSeries/test_x%d_30.npy' % i),s)
labels_train = np.load('ground_truth/train_y%d_30.npy' % i)
labels_val = np.load('ground_truth/validation_y%d_20.npy' % i)
labels_test = np.load('ground_truth/test_y%d_30.npy' % i)
clf = RandomForestClassifier()
merged_ts = np.concatenate((ts_train,ts_val),axis=0)
merged_labels = np.concatenate((labels_train,labels_val),axis=0)
mytestfold = []
for i in range(len(ts_train)):
mytestfold.append(-1)
for i in range(len(ts_val)):
mytestfold.append(0)
ps = PredefinedSplit(test_fold=mytestfold)
grid_search = GridSearchCV(estimator=clf, param_grid=param_grid,cv=ps)
grid_search.fit(merged_ts, merged_labels)
Is there a better way to do this?