I'm working on a script that trains both a ranger random forest and a xgb regression. Depending on which performs best based on rmse, one or the other is used to test against hold out data.
I would also like to return feature importance for both in a comparable way.
With the xgboost library, I can get my feature importance table and plot like so:
> xgb.importance(model = regression_model)
Feature Gain Cover Frequency
1: spend_7d 0.981006272 0.982513621 0.79219969
2: IOS 0.006824499 0.011105014 0.08112324
3: is_publisher_organic 0.006379284 0.002917203 0.06770671
4: is_publisher_facebook 0.005789945 0.003464162 0.05897036
Then I can plot it like so:
> xgb.importance(model = regression_model) %>% xgb.plot.importance()
That was using xgboost library and their functions. With ranger random forrest, if I fit a regression model, I can get feature importance if I include importance = 'impurity'
while fitting the model. Then:
regression_model$variable.importance
spend_7d d7_utility_sum recent_utility_ratio IOS is_publisher_organic is_publisher_facebook
437951687132 0 0 775177421 600401959 1306174807
I could just create a ggplot. But the scales are entirely different between what ranger returns in that table and what xgb shows in the plot.
Is there an out of the box library or solution where I can plot the feature importance of either the xgb or ranger model in a comparable way?