Using the UCI Human Activity Recognition dataset, I am trying to generate a DecisionTreeClassifier Model. With default parameters and random_state set to 156, the model returns the following accuracy:
dt_clf = DecisionTreeClassifier(random_state=156)
dt_clf.fit(X_train, y_train)
pred = dt_clf.predict(X_test)
print('DecisionTree Accuracy Score: {0:.4f}'.format(accuracy_score(y_test, pred)))
Output:
DecisionTree Accuracy Score: 0.8548
With an arbitrary set of max_depth, I ran GridSearchCV to find its best parameters:
params = {
'max_depth': [6, 8, 10, 12, 16, 20, 24]
}
grid_cv = GridSearchCV(dt_clf, param_grid=params, scoring='accuracy', cv=5, verbose=1)
grid_cv.fit(X_train, y_train)
print('GridSearchCV Best Score: {0:.4f}'.format(grid_cv.best_score_))
print('GridSearchCV Best Params:', grid_cv.best_params_)
Output:
Fitting 5 folds for each of 7 candidates, totalling 35 fits
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1
concurrent workers. [Parallel(n_jobs=1)]: Done 35 out of 35 |
elapsed: 1.6min finished GridSearchCV Best Score: 0.8513 GridSearchCV
Best Params: {'max_depth': 16}
Now, I wanted to test out the "best parameter" max_depth=16 on a separate test set to see if it truly was the best parameter among the provided list max_depth = [6, 8, 10, 12, 16, 20, 24].
max_depths = [6, 8, 10, 12, 16, 20, 24]
for depth in max_depths:
dt_clf = DecisionTreeClassifier(max_depth=depth, random_state=156)
dt_clf.fit(X_train, y_train)
pred = dt_clf.predict(X_test)
accuracy = accuracy_score(y_test, pred)
print('max_depth = {0} Accuracy: {1:.4f}'.format(depth, accuracy))
But to my surprise, the test showed that the "best parameter" max_depth=16 was no where close to being the best out of the bunch:
Output:
max_depth = 6 Accuracy: 0.8558
max_depth = 8 Accuracy: 0.8707
max_depth = 10 Accuracy: 0.8673
max_depth = 12 Accuracy: 0.8646
max_depth = 16 Accuracy: 0.8575
max_depth = 20 Accuracy: 0.8548
max_depth = 24 Accuracy: 0.8548
I understand that the best parameters from GridSearchCV are based on the mean test scores resulting from cross-validating the training set (X_train, y_train), but shouldn't it still be reflected on the test set to a certain extent? I presume the UCI datasets are not imbalanced so dataset bias shouldn't be an issue.