Good morning,
I have a question about calculating feature importance for bagged and boosted regression tree models with MLR package in R. I am using XGBOOST to make predictions and i'm using bagging to estimate prediction uncertainty. My data set is relatively large; approximately 10k features and observations. The predictions work perfectly (see code below), but I can't seem to calculate feature importance (the last line in the code below). The importance function crashes with no errors... and freezes the R session. I saw some related python code, where people seem to calculate the importance for each of the bagged models here and here. I haven't been able to get that to work properly in R either. Specifically, i'm not sure how to access individual models within the objected produced by MLR (mb object in the code below). In python, this seems to be trivial. In R, i can't seem to extract mb$learner.model
, which seems logically closest to what i need. So i'm wondering if anyone had any experience with this issues?
Please see the code below
learn1 <- makeRegrTask(data = train.all , target= "resp", weights = weights1)
lrn.xgb <- makeLearner("regr.xgboost", predict.type = "response")
lrn.xgb$par.vals <- list( objective="reg:squarederror", eval_metric="error", nrounds=300, gamma=0, booster="gbtree", max.depth=6)
lrn.xgb.bag = makeBaggingWrapper(lrn.xgb, bw.iters = 50, bw.replace = TRUE, bw.size = 0.85, bw.feats = 1)
lrn.xgb.bag <- setPredictType(lrn.xgb.bag, predict.type="se")
mb = mlr::train(lrn.xgb.bag, learn1)
fimp1 <- getFeatureImportance(mb)
getFeatureImportance()
takes a wrapped model somb
should be fine here. Also see?mlr::getLearnerModel()
. Have a look at the vignettes also. – pat-smlr::getFeatureImportance(mb)
gives meError in xgboost::xgb.importance(feature_names = .model$features, model = mod : model: must be an object of class xgb.Booster
. However, i can extract individual modelmb1 <- getLearnerModel(mb, more.unwrap = T)
, try to get importance for a single modelmlr::getFeatureImportance(mb1[[1]])
and getError: Assertion on 'object' failed: Must inherit from class 'WrappedModel', but has class 'xgb.Booster'
. This looks to me like a class issue? – YodigetFeatureImportance()
takes aWrappedModel
but the model created by a BaggingWrapper is aHomogeneousEnsembleModel
, for which mlr does not offer an own method. So you have to aggregate the feature importance values manually. However, it won't work if each model is just trained on one feature (bw.feats = 1
) like in your example. – jakob-rbw.feats=1
refers to "Percentage size of randomly selected features in bags", so each model has many features. But thanks for the clarification. However, if i understand you correctly, i can manually create a ensemble with 50 different models (same learner but different names) and then thegetFeatureImportance()
should work on the ensemble? – Yodibw.feats = 1
equals 100% of the features which is a sensible decision. I posted an answer that should work. – jakob-r