0
votes

I have three datasets: train, validation, test and I am currently using an XGBoost Classifier to do the job on a classification task.

I trained the XGBClassifier on the train set and saved it as a pickle file to avoid having to re-train it every time. Once I load the model from the pickle file, I am able to use the predict method from it, but I don't seem to be able to train this model on the validation set or any other new dataset.

Note: I do not get any error output, the jupyter lab cell looks like it's working perfectly, but my CPU cores are all resting during this cell's operation, so I see the model isn't being fitted.

Could this be a problem with XGBoost or pickle dumped models are not able to be fitted again after loading?

1
Do you want to continue training it from the point it stopped? (By creating more boosters) - Eran Moshe
@EranMoshe Yes, I've trained on train set, saved it and restarted jupyter kernel. Now I want to load the model and train it on validation set, making the model fitted both on train and validation. - Gabriel Ziegler

1 Answers

0
votes

I had the exact same question a year ago, You can find here the question and answer

Though, in this way, you will keep adding "trees" (boosters) to your existing model, using your new data.

It might be better to train a new model on your training + validation data sets.

Whatever you decide to do, you should try both options and evaluate your results to see what fits better for your data.