0
votes

I would like to find the importance of each feature in my dataframe using Scikit learn.

I am trying to use it in Scikit learn instead of using Info Gain via WEKA software which provide the score and the feature name next to it.

I implemented the next method, but I don't know how to replace the ranking number in score.

For example:

I don't want to see:

  1. feature 6
  2. feature 4

...

However, I prefer:

0.4 feature 6

0.233 feature 4

...

Here is my method:

def _rank_features(self, dataframe, targeted_class):
    from sklearn.feature_selection import RFE
    from sklearn.linear_model import LinearRegression

    feature_names = list(dataframe.columns.values)

    # use linear regression as the model
    lr = LinearRegression()
    # rank all features, i.e continue the elimination until the last one
    rfe = RFE(lr, n_features_to_select=1)
    rfe.fit(dataframe, targeted_class)

    print "Features sorted by their rank:"
    print sorted(zip(map(lambda x: round(x, 4), rfe.ranking_), feature_names))

Is someone know how to convert from ranking into score?

1
What is the output of your code ? It does not work ?MMF
The output looks like this: eatures sorted by their rank: [(1.0, 'feature 6'), (2.0, 'feature 4'), (3.0, 'feature 3'), ... ]Aviade
RFE in sklearn just eliminates the worst features given a threshold (if you look on the source code), it does not compute the importance of the featuresMMF

1 Answers

0
votes

If you want to get the importance of your features you can use a decision tree. In sklearn it has an attribute called feature_importances.

So what I suggest you to do is to reduce your feature space using RFE and then fit you Decision Tree on your dataset projected on these features. You will be able to get the importance of each feature.

Remark : The importance of each feature is relative to the set of features used. So the importances you will get using this method won't be the general importances you wanted to get using all the features. But it gives you a good idea of the importances amongst the most important features.