Should I retrain the model with the whole dataset after using a train-test split to find the best hyper parameters?

0

votes

I split my dataset into training and testing. At the end after finding the best hyper parameters for the training dataset, should I fit the model again using all the data? The point is to reach the highest possible score for new data.

machine-learningdata-science

0

votes

Yes, that would help to generalize your model, as more data generally means better generalization.

0

votes

I don't think so. If you do that, you will no longer have a valid test set. What happens when you come back to improve the model later? If you do this, then you will need a new test set each model improvement, which means more labeling. You won't be able to compare experiments across model versions, because the test set won't be identical.

If you consider this model finished forever, then ok.

Should I retrain the model with the whole dataset after using a train-test split to find the best hyper parameters?

2 Answers