1
votes

I've trained an XGBoost model and used plot_importance() to plot which features are the most important in the trained model. Although, the numbers in plot have several decimal values which floods the plot and does not fit into the plot.

I have searched for plot formatting options, but I only found how to format axis (tried formatting X axis in hope that it would format corresponding axes as well)

I work in a Jupyter Noteboook (if that makes any difference). The code is as follows:

xg_reg = xgb.XGBClassifier(
                objective = 'binary:logistic',
                colsample_bytree = 0.4,
                learning_rate = 0.01,
                max_depth = 15, 
                alpha = 0.1, 
                n_estimators = 5,
                subsample = 0.5,
                scale_pos_weight = 4
                )
xg_reg.fit(X_train, y_train) 
preds = xg_reg.predict(X_test)

ax = xgb.plot_importance(xg_reg, max_num_features=3, importance_type='gain', show_values=True) 

fig = ax.figure
fig.set_size_inches(10, 3)

Is there something I'm missing? Are there any formatting functions or parameters to pass?

I would like to be able to format feature importance scores, or at least drop the decimal part (e.g. "25" instead of "25.66521"). Attached a current plot below.

xgboost_feature_importance_scores

3

3 Answers

2
votes

It is possible to get the results you want without editing the xgboost plotting functions. The plotting function can take a dictionary of importances as its first argument, which you can create directly from your xgboost model, and then edit. This is also handy if you would like to make friendlier labels for the feature names.

# Get the booster from the xgbmodel
booster = xg_reg.get_booster()

# Get the importance dictionary (by gain) from the booster
importance = booster.get_score(importance_type="gain")

# make your changes
for key in importance.keys():
    importance[key] = round(importance[key],2)

# provide the importance dictionary to the plotting function
ax = plot_importance(importance, max_num_features=3, importance_type='gain', show_values=True)
1
votes

I got the same trouble here that I just resolved.

It occurs only because for 'gain' or 'cover' the numbers contains too many floating numbers opposite to the 'weight' option. Unfortunately, as far as I know, there is no option to specify the number of digits. I thus modified the functions by myself to specify the maximum number of digits allowed. Here are the modifications to perform in the plotting.py file of the xgboost package. If you are working with the spider console, you can found and open the file simply by specifying a wrong option (I am a lazy guy), for instance :

xgb.plot_importance(xg_reg, potato=False)

And then click on the file from the Error in the console. Next step is to modify the function itself as here after :

def plot_importance(booster, ax=None, height=0.2,
                    xlim=None, ylim=None, title='Feature importance',
                    xlabel='F score', ylabel='Features',
                    importance_type='weight', max_num_features=None,
                    grid=True, show_values=True, max_digits=3, **kwargs): 

and then you should also add before the show_values conditions :

if max_digits is not None:
    t = values
    lst = list(t)
    if len(str(lst[0]).split('.')[-1])>max_digits:
        values_displayed = tuple([('{:.'+str(max_digits)+'f}').format(x) for x in lst])
    else:
        values_displayed = values

if show_values is True:
    for x, x2, y in zip(values, values_displayed, ylocs):
        ax.text(x + 1, y, x2, va='center')

I performed a condition to only format the number it this latter is longer than the number of digits specified. It avoids for instance for the importance_type='weight' option to produce unwanted digits.

Note that for for 'cover' and 'gain' the text is also bad positioned for me, I thus also modified the shift and replaced the 1 here above by :

if show_values is True:
    for x, x2, y in zip(values, values_displayed, ylocs):
         dx = np.max(values)/100
         ax.text(x + dx, y, x2, va='center')

Hoping it will help you !

1
votes

Edit the code of plotting.py in xgboost package with :

86 ylocs = np.arange(len(values))
87 values=tuple([round(x,4) for x in values])
88 ax.barh(ylocs, values, align='center', height=height, **kwargs)

enter image description here