I am trying to use XGBoost scikit wrapper with early stopping in a regression problem. Weirdly enough, the computation of the early stopping eval_metric
(in my case, rmse
) fails at each early stopping
round. That is weird because the same estimator does work on the eval_set
without early stopping
.
Here is the code:
eval_train_indices=y.dropna()[:-n_splits].index
eval_test_indices=y.dropna()[-n_splits:].index
X_train, X_test=X.loc[eval_train_indices,:], X.loc[eval_test_indices,:]
y_train, y_test = y.loc[eval_train_indices], y.loc[eval_test_indices]
eval_set = [(X_train, y_train), (X_test, y_test)]
predictor=XGBRegressor(n_estimators = 50000, subsample=0.8, **{params})
predictor.fit(X, y,
eval_metric=["rmse"],
eval_set=eval_set,
early_stopping_rounds=40,
verbose=True)
And the error message it yields :
<ipython-input-65-358402bfa21c> in fit(self, T)
147 early_stopping_rounds=40,
148 verbose=True)
150
151 n_estimators=int(self.predictor.best_iteration*1.0)
/Users/Nicolas/anaconda2/lib/python2.7/site-packages/xgboost-0.7-py2.7.egg/xgboost/sklearn.pyc in fit(self, X, y, sample_weight, eval_set, eval_metric, early_stopping_rounds, verbose, xgb_model)
291 early_stopping_rounds=early_stopping_rounds,
292 evals_result=evals_result, obj=obj, feval=feval,
--> 293 verbose_eval=verbose, xgb_model=xgb_model)
294
295 if evals_result:
/Users/Nicolas/anaconda2/lib/python2.7/site-packages/xgboost-0.7-py2.7.egg/xgboost/training.pyc in train(params, dtrain, num_boost_round, evals, obj, feval, maximize, early_stopping_rounds, evals_result, verbose_eval, xgb_model, callbacks, learning_rates)
202 evals=evals,
203 obj=obj, feval=feval,
--> 204 xgb_model=xgb_model, callbacks=callbacks)
205
206
/Users/Nicolas/anaconda2/lib/python2.7/site-packages/xgboost-0.7-py2.7.egg/xgboost/training.pyc in _train_internal(params, dtrain, num_boost_round, evals, obj, feval, xgb_model, callbacks)
97 end_iteration=num_boost_round,
98 rank=rank,
---> 99 evaluation_result_list=evaluation_result_list))
100 except EarlyStopException:
101 break
/Users/Nicolas/anaconda2/lib/python2.7/site-packages/xgboost-0.7-py2.7.egg/xgboost/callback.pyc in callback(env)
245 best_msg=state['best_msg'])
246 elif env.iteration - best_iteration >= stopping_rounds:
--> 247 best_msg = state['best_msg']
248 if verbose and env.rank == 0:
249 msg = "Stopping. Best iteration:\n{}\n\n"
KeyError: 'best_msg'
For some reason, XGB seems unable to compute the RMSE during the early stopping rounds, although it does work when tested on the eval train and test set without early stopping
. When verbose=True
, it shows the following :
[0] validation_0-rmse:nan validation_1-rmse:nan
Multiple eval metrics have been passed: 'validation_1-rmse' will be used for early stopping.
Will train until validation_1-rmse hasn't improved in 40 rounds.
[1] validation_0-rmse:nan validation_1-rmse:nan
[2] validation_0-rmse:nan validation_1-rmse:nan
[3] validation_0-rmse:nan validation_1-rmse:nan
[4] validation_0-rmse:nan validation_1-rmse:nan
[5] validation_0-rmse:nan validation_1-rmse:nan
[6] validation_0-rmse:nan validation_1-rmse:nan
[7] validation_0-rmse:nan validation_1-rmse:nan
[8] validation_0-rmse:nan validation_1-rmse:nan
[9] validation_0-rmse:nan validation_1-rmse:nan
[10] validation_0-rmse:nan validation_1-rmse:nan
[11] validation_0-rmse:nan validation_1-rmse:nan
[12] validation_0-rmse:nan validation_1-rmse:nan
[13] validation_0-rmse:nan validation_1-rmse:nan
[14] validation_0-rmse:nan validation_1-rmse:nan
[15] validation_0-rmse:nan validation_1-rmse:nan
[16] validation_0-rmse:nan validation_1-rmse:nan
[17] validation_0-rmse:nan validation_1-rmse:nan
[18] validation_0-rmse:nan validation_1-rmse:nan
[19] validation_0-rmse:nan validation_1-rmse:nan
[20] validation_0-rmse:nan validation_1-rmse:nan
[21] validation_0-rmse:nan validation_1-rmse:nan
[22] validation_0-rmse:nan validation_1-rmse:nan
[23] validation_0-rmse:nan validation_1-rmse:nan
[24] validation_0-rmse:nan validation_1-rmse:nan
[25] validation_0-rmse:nan validation_1-rmse:nan
[26] validation_0-rmse:nan validation_1-rmse:nan
[27] validation_0-rmse:nan validation_1-rmse:nan
[28] validation_0-rmse:nan validation_1-rmse:nan
[29] validation_0-rmse:nan validation_1-rmse:nan
[30] validation_0-rmse:nan validation_1-rmse:nan
[31] validation_0-rmse:nan validation_1-rmse:nan
[32] validation_0-rmse:nan validation_1-rmse:nan
[33] validation_0-rmse:nan validation_1-rmse:nan
[34] validation_0-rmse:nan validation_1-rmse:nan
[35] validation_0-rmse:nan validation_1-rmse:nan
[36] validation_0-rmse:nan validation_1-rmse:nan
[37] validation_0-rmse:nan validation_1-rmse:nan
[38] validation_0-rmse:nan validation_1-rmse:nan
[39] validation_0-rmse:nan validation_1-rmse:nan
[40] validation_0-rmse:nan validation_1-rmse:nan
I don't even understand what could cause a failure to compute RMSE. It may be due to missing values but there are not when I print predictor.predict(X_test)