0
votes

According to the xgboost documentation (https://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.training) the xgboost returns feature importances:

feature_importances_

Feature importances property

Note

Feature importance is defined only for tree boosters. Feature importance is only defined when the decision tree model is chosen as base learner ((booster=gbtree). It is not defined for other base learner types, such as linear learners (booster=gblinear).

Returns: feature_importances_

Return type: array of shape [n_features]

However, this does not seem to case, as the following toy example shows:

import seaborn as sns
import xgboost as xgb

mpg = sns.load_dataset('mpg')

toy = mpg[['mpg', 'cylinders', 'displacement', 'horsepower', 'weight',
       'acceleration']]

toy = toy.sample(frac=1)

N = toy.shape[0]

N1 = int(N/2)

toy_train = toy.iloc[:N1, :]
toy_test = toy.iloc[N1:, :]

toy_train_x = toy_train.iloc[:, 1:]

toy_train_y = toy_train.iloc[:, 1]

toy_test_x = toy_test.iloc[:, 1:]

toy_test_y = toy_test.iloc[:, 1]

max_depth = 6
eta = 0.3
subsample = 0.8
colsample_bytree = 0.7
alpha = 0.1

params = {"booster" : 'gbtree' , 'objective' : 'reg:linear' , 'max_depth' : max_depth, 'eta' : eta,\
             'subsample' : subsample, 'colsample_bytree' : colsample_bytree, 'alpha' : alpha}

dtrain_toy = xgb.DMatrix(data = toy_train_x , label = toy_train_y)
dtest_toy = xgb.DMatrix(data = toy_test_x, label = toy_test_y)
watchlist = [(dtest_toy, 'eval'), (dtrain_toy, 'train')]

xg_reg_toy = xgb.train(params = params, dtrain = dtrain_toy, num_boost_round = 1000, evals = watchlist, \
                early_stopping_rounds = 20)

xg_reg_toy.feature_importances_
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-378-248f7887e307> in <module>()
----> 1 xg_reg_toy.feature_importances_

AttributeError: 'Booster' object has no attribute 'feature_importances_'
2
have you tried with xgboost sklearn, because it works for meJeril
Yes indeed the scikit learn API returns the feature importances.user8270077

2 Answers

0
votes

What you are using is Learning API, but you are referencing to Scikit-Learn API. And only Scikit-Learn API have the attribute feature_importances.

0
votes

For someone who is not using Scikit-Learn API like me, because of obvious reasons. From here I was able to get the importance of the feature:

clf.get_score()

Also, I was looking into a more intuitive representation here:

from xgboost import plot_importance
plot_importance(clf, max_num_features=10)

This generates the bar chart with specified (optional) max_num_features in the order of their importance.