
Hopefully I'm reading this wrong but in the XGBoost library documentation, there is note of extracting the feature importance attributes using feature_importances_ much like sklearn's random forest.

However, for some reason, I keep getting this error: AttributeError: 'XGBClassifier' object has no attribute 'feature_importances_'

My code snippet is below:

from sklearn import datasets
import xgboost as xg
iris = datasets.load_iris()
X = iris.data
Y = iris.target
Y = iris.target[ Y < 2] # arbitrarily removing class 2 so it can be 0 and 1
X = X[range(1,len(Y)+1)] # cutting the dataframe to match the rows in Y
xgb = xg.XGBClassifier()
fit = xgb.fit(X, Y)

It seems that you can compute feature importance using the Booster object by calling the get_fscore attribute. The only reason I'm using XGBClassifier over Booster is because it is able to be wrapped in a sklearn pipeline. Any thoughts on feature extractions? Is anyone else experiencing this?

I can't reproduce the problem with your snippet. What version of XGBoost do you have?BrenBarn
from my pip freeze , i have xgboost==0.4a30Minh Mai
Does this help? kaggle.com/mmueller/…Chong Tang
I have seen this before. The problem is however, is that the get_fscore method is bound to the Booster object rather than XGBClassifier from my understanding. See the doc hereMinh Mai
I have 0.4 and your snippet works with no problem.BrenBarn

9 Answers


As the comments indicate, I suspect your issue is a versioning one. However if you do not want to/can't update, then the following function should work for you.

def get_xgb_imp(xgb, feat_names):
    from numpy import array
    imp_vals = xgb.booster().get_fscore()
    imp_dict = {feat_names[i]:float(imp_vals.get('f'+str(i),0.)) for i in range(len(feat_names))}
    total = array(imp_dict.values()).sum()
    return {k:v/total for k,v in imp_dict.items()}

>>> import numpy as np
>>> from xgboost import XGBClassifier
>>> feat_names = ['var1','var2','var3','var4','var5']
>>> np.random.seed(1)
>>> X = np.random.rand(100,5)
>>> y = np.random.rand(100).round()
>>> xgb = XGBClassifier(n_estimators=10)
>>> xgb = xgb.fit(X,y)
>>> get_xgb_imp(xgb,feat_names)
{'var5': 0.0, 'var4': 0.20408163265306123, 'var1': 0.34693877551020408, 'var3': 0.22448979591836735, 'var2': 0.22448979591836735}

For xgboost, if you use xgb.fit(),then you can use the following method to get feature importance.

import pandas as pd
columns=['feature','importance']).sort_values('importance', ascending=False)

from xgboost import plot_importance
plot_importance(xgb_model, )

I found out the answer. It appears that version 0.4a30 does not have feature_importance_ attribute. Therefore if you install the xgboost package using pip install xgboost you will be unable to conduct feature extraction from the XGBClassifier object, you can refer to @David's answer if you want a workaround.

However, what I did is build it from the source by cloning the repo and running . ./build.sh which will install version 0.4 where the feature_importance_ attribute works.

Hope this helps others!


Get Feature Importance as a sorted data frame

import pandas as pd
import numpy as np
def get_xgb_imp(xgb, feat_names):
    imp_vals = xgb.booster().get_fscore()
    feats_imp = pd.DataFrame(imp_vals,index=np.arange(2)).T
    feats_imp.iloc[:,0]= feats_imp.index    
    return feats_imp

feature_importance_df = get_xgb_imp(xgb, feat_names)

For those having the same problem as Luís Bianchin, "TypeError: 'str' object is not callable", I found a solution (that works for me at least) here.

In short, I found modifying David's code from

imp_vals = xgb.booster().get_fscore()


imp_vals = xgb.get_fscore()

worked for me.

For more detail I would recommend visiting the link above.

Big thanks to David and ianozsvald


An update of the accepted answer since it no longer works:

def get_xgb_imp(xgb_model, feat_names):
    imp_vals = xgb_model.get_fscore()
    imp_dict = {feat: float(imp_vals.get(feat, 0.)) for feat in feat_names}
    total = sum(list(imp_dict.values()))
    return {k: round(v/total, 5) for k,v in imp_dict.items()}

It seems like the api keeps on changing. For xgboost version 1.0.2, just changing from imp_vals = xgb.booster().get_fscore() to imp_vals = xgb.get_booster().get_fscore() in @David's answer does the trick. The updated code is -

from numpy import array

def get_xgb_imp(xgb, feat_names):
    imp_vals = xgb.get_booster().get_fscore()
    imp_dict = {feat_names[i]:float(imp_vals.get('f'+str(i),0.)) for i in range(len(feat_names))}
    total = array(imp_dict.values()).sum()
    return {k:v/total for k,v in imp_dict.items()}

You can also use the built-in plot_importance function:

from xgboost import XGBClassifier, plot_importance
fit = XGBClassifier().fit(X,Y)

enter image description here


The alternative to built-in feature importance can be:

I really like shap package because it provides additional plots. Example:

Importance Plot

xgboost shap importance

Summary Plot

xgboost shap summary

Dependence Plot

xgboost shap dependence

You can read about alternative ways to compute feature importance in Xgboost in this blog post of mine.