XGBoost- Help interpreting the booster behaviour. Why is the 0th iteration always coming out to be best?

Question

I am training an XGBoost model and having trouble interpreting the model behaviour.

early_stopping_rounds =10
num_boost_round=100
Dataset is unbalanced with 458644 1s and 7975373 0s
evaluation metric is AUCPR
param = {'max_depth':6, 'eta':0.03, 'silent':1, 'colsample_bytree': 0.3,'objective':'binary:logistic', 'nthread':6, 'subsample':1, 'eval_metric':['aucpr']}

From my understanding of "early_stopping_rounds" the training is supposed to stop after no improvement is observed in the test/evaluation dataset's eval metric(aucpr) for 10 consecutive rounds. However, in my case, even when there is a clear improvement in the AUCPR of the evaluation dataset, the training still stops after the 10th boosting stage. Please see the training log below. Additionally, the best iteration comes out to be the 0th one when clearly the 10th iteration has an AUCPR much higher than the 0th iteration.

Is this right? If not what could be going wrong? If yes then please correct my understanding about early stopping rounds and best iteration.

MusHusKat MusHusKat · Accepted Answer · 2021-02-02T05:31:34

Very interesting!!

So it turns out that early_stopping looks to minimize (RMSE, log loss, etc.) and to maximize (MAP, NDCG, AUC) - https://xgboost.readthedocs.io/en/latest/python/python_intro.html

When you use aucpr, it is actually trying to minimize it - perhaps that's the default behavior.

Try to set maximize=True when calling xgboost.train() - https://github.com/dmlc/xgboost/issues/3712

XGBoost- Help interpreting the booster behaviour. Why is the 0th iteration always coming out to be best?

1 Answers