I'm currently using xgb.train(...)
which returns a booster but I'd like to use RFE to select the best 100 features. The returned booster cannot be used in RFE as it's not a sklearn estimator. XGBClassifier is the sklearn api into the xgboost library, however, I am not able to get the same results as with the xgb.train(...)
method (10% worse on roc-auc). I've tried the sklearn boosters but they're not able to get similar results either. I've also tried to wrap the xgb.train(...)
method in a class to add sklearn estimator methods but there's just too many to change. Is there some way to use the xgb.train(...)
along with RFE from sklearn?
7
votes
XGBoost has an sklearn wrapper already. Does that work for you? xgboost.readthedocs.io/en/latest/python/…
– hume
1 Answers
2
votes
For this kind of problem, I created shap-hypetune: a python package for simultaneous Hyperparameters Tuning and Features Selection for Gradient Boosting Models
In your case, this enables you to perform RFE
with XGBClassifier
in a very simple and intuitive way:
from shaphypetune import BoostRFE
model = BoostRFE(XGBClassifier(), min_features_to_select=1, step=1)
model.fit(X_train, y_train, eval_set=[(X_valid, y_valid)], early_stopping_rounds=6, verbose=0)
pred = model.predict(X_test)
As you can see, you can use all the fitting options available in the standard XGB API, like early_stopping_rounds
or custom metrics, to customize the training process.
You can use shap-hypetune also to compute parameter tuning (also simultaneously with feature selection) or to compute feature selection with RFE
or Boruta
using SHAP feature importance. Full example available here