2
votes

I was trying to estimate the feature importance for the classification task I have at hand. What is important for me is to get specific numbers which represent importance of each feature, not just 'pick X features which are most important'.

Obvious choice was to use the tree-based methods which provide nice feature_importances_ method to obtain importance of each feature. But I wasn't satisfied with the results of the tree-base classifiers. I learned that SelectFromModel method is capable to eliminate not important features based on the importance score and successfully does that for SVM or Linear models as well.

I wonder, is there any way to obtain specific importance score from SelectFromModel for each feature instead of just getting a list of most important features?

1

1 Answers

2
votes

Looking through the GitHub source code, I found this piece of code:

def _get_feature_importances(estimator):
    """Retrieve or aggregate feature importances from estimator"""
    importances = getattr(estimator, "feature_importances_", None)

    if importances is None and hasattr(estimator, "coef_"):
        if estimator.coef_.ndim == 1:
            importances = np.abs(estimator.coef_)

        else:
            importances = np.sum(np.abs(estimator.coef_), axis=0)

    elif importances is None:
        raise ValueError(
            "The underlying estimator %s has no `coef_` or "
            "`feature_importances_` attribute. Either pass a fitted estimator"
            " to SelectFromModel or call fit before calling transform."
            % estimator.__class__.__name__)

    return importances

Thus, if you're using a linear model, the code is simply using the model coefficients as the "importance scores".

You can do that by pulling the coef_ attribute from the estimator that you passed into SelectFromModel.

Example:

sfm = SelectFromModel(LassoCV(), 0.25)
sfm.fit(X, y)
print(sfm.estimator_.coef_)  # print "importance" scores