here below is a question about xgboost early stopping rounds parameter and how it does, or does not, give the best iteration when it is the reason why the fit ends.
In xgboost documentation, one can see in the scikit learn api section (link) that when the fit stops due to the early stopping rounds parameter:
Activates early stopping. Validation error needs to decrease at least every "early_stopping_rounds" round(s) to continue training. Requires at least one item in evals. If there’s more than one, will use the last. Returns the model from the last iteration (not the best one).
When reeding this, it seems that the model returned, in this case, is not the best one but the last one. To access the best one when predict, it says, it is possible to call the predict using the ntree_limit parameter with the bst.best_ntree_limit given at the end of the fit.
In this sense, it should work the same way as the train of xgboost since the fit of the scikitlearn api seems to be only an embedding of the train and others.
It is wiedly discussed here stack overflow discussion or here another discussion
But when I tried to address this problem and check how it worked with my data, I did not find the behavior that I thought I should have. In fact the behavior I encountered was not at all the one discribed in those discussions and documentation.
I call a fit this way:
reg = xgb.XGBRegressor(n_jobs=6, n_estimators = 100, max_depth= 5)
reg.fit(
X_train,
y_train,
eval_metric='rmse',
eval_set=[(X_train, y_train), (X_valid, y_valid)],
verbose=True,
early_stopping_rounds = 6)
and here is what I get in the end:
[71] validation_0-rmse:1.70071 validation_1-rmse:1.9382
[72] validation_0-rmse:1.69806 validation_1-rmse:1.93825
[73] validation_0-rmse:1.69732 validation_1-rmse:1.93803
Stopping. Best iteration:
[67] validation_0-rmse:1.70768 validation_1-rmse:1.93734
and when I check the values of the validation I used :
y_pred_valid = reg.predict(X_valid)
y_pred_valid_df = pd.DataFrame(y_pred_valid)
sqrt(mse(y_valid, y_pred_valid_df[0]))
I get
1.9373418403889535
If the fit had return the last iteration instead of the best one it should have given an rmse around 1.93803 but it gave an rmse at 1.93734, exactly the best score.
I checked again by two ways: [Edit] I've edited the code below according to @Eran Moshe answer
y_pred_valid = reg.predict(X_valid, ntree_limit=reg.best_ntree_limit)
y_pred_valid_df = pd.DataFrame(y_pred_valid)
sqrt(mse(y_valid, y_pred_valid_df[0]))
1.9373418403889535
and even if I call the fit (knowing the best iter is the 67th) with only 68 estimators so that I'm sure the last one is the best one:
reg = xgb.XGBRegressor(n_jobs=6, n_estimators = 68, max_depth= 5)
reg.fit(
X_train,
y_train,
eval_metric='rmse',
eval_set=[(X_train, y_train), (X_valid, y_valid)],
verbose=True,
early_stopping_rounds = 6)
the result is the same:
1.9373418403889535
So that seems to lead to the idea that, unlike the documentation, and those numerous discussions about it, tell, the fit of xgboost, when stopped by the early stopping round parameter, does give the best iter, not the last one.
Am I wrong, if so, where, and how do you explain the behavior I met ?
Thanks for the attention