I am using Python to train an XGBoost Regressor on a 25 feature column dataset and SKlearn's GridSearchCV for parameter tuning. GridSearchCV allows you to choose your scorer with the 'scoring' parameter, and r2
is a valid option.
grid = GridSearchCV(mdl, param_grid=params, verbose=1, cv=kfold,
n_jobs=-1, error_score='raise',scoring='r2')`
However, when I look to use r2
as my 'eval_metric' in the grid.fit()
function, I don't have a great way to use r2
.
grid.fit(X_train, y_train, eval_set=[(X_test, y_test)],
eval_metric='rmse', early_stopping_rounds=150)
I have tried using sklearns built-in r2_score
method, but there are a few issues. The first being, an r2 score
is calculated given the y_test
set against the y_pred
set. And in order to have a y_pred
set, we need to fit the model. So you can see I'm running into a looping issue.
I have tried a few things to get around this. The first being training the model and making predictions inside the eval_metric variable like below:
grid.fit(X_train, y_train, eval_set=[(X_test, y_test)],
eval_metric=r2_score(y_test, mdl.predict(X_test)), early_stopping_rounds=150)
But I am given the following error:
xgboost.core.XGBoostError: need to call fit beforehand
Which makes sense.
Is there some way that I can grab the current parameters that the GridSearchCV is using, create and store predictions, and then use the r2_score
as the eval_metric?
My thoughts are this. The r2 score is a standard evaluation metric on a scale of 0 to 1 (1 being a perfect fit). This is a metric that if there were a way to standardize optimizing it, would have a very far reach across almost all machine learning.
y_test
vsy_pred
. So I do not see why is that a showstopper. 2) The signature of a callable is specific to xgboost, seeeval_metric
documentation in here: xgboost.readthedocs.io/en/latest/python/…. 3) why do you want to haveeval_metric
at the first place? it is not used in optimisation but only for monitoring of performance between iterations and early stopping. – Mischa Lisovyi