How to use RFE with xgboost Booster?

Question

I'm currently using xgb.train(...) which returns a booster but I'd like to use RFE to select the best 100 features. The returned booster cannot be used in RFE as it's not a sklearn estimator. XGBClassifier is the sklearn api into the xgboost library, however, I am not able to get the same results as with the xgb.train(...) method (10% worse on roc-auc). I've tried the sklearn boosters but they're not able to get similar results either. I've also tried to wrap the xgb.train(...) method in a class to add sklearn estimator methods but there's just too many to change. Is there some way to use the xgb.train(...) along with RFE from sklearn?

XGBoost has an sklearn wrapper already. Does that work for you? xgboost.readthedocs.io/en/latest/python/… — hume

Marco Cerliani Marco Cerliani · Accepted Answer · 2021-05-20T08:08:23

For this kind of problem, I created shap-hypetune: a python package for simultaneous Hyperparameters Tuning and Features Selection for Gradient Boosting Models

In your case, this enables you to perform RFE with XGBClassifier in a very simple and intuitive way:

from shaphypetune import BoostRFE

model = BoostRFE(XGBClassifier(), min_features_to_select=1, step=1)
model.fit(X_train, y_train, eval_set=[(X_valid, y_valid)], early_stopping_rounds=6, verbose=0)

pred = model.predict(X_test)

As you can see, you can use all the fitting options available in the standard XGB API, like early_stopping_rounds or custom metrics, to customize the training process.

You can use shap-hypetune also to compute parameter tuning (also simultaneously with feature selection) or to compute feature selection with RFE or Boruta using SHAP feature importance. Full example available here

How to use RFE with xgboost Booster?

1 Answers