2
votes

I have trained a SVM and logistic regression classifier on my dataset. Both classifier provide a weight vector which is of the size of the number of features. I can use this weight vector to select the 10 most important features by just selecting the 10 features with the highest weights.

Should I use the absolute values of the weights, i.e. selecting the 10 features with the highest absolute values?

Second, this only works for SVM with linear kernel but not with RBF kernel as I have read. For non-linear kernel the weights are somehow no more linear. What is the exact reason that the weight vector cannot be used to determine the importance of features in case of non-linear kernel SVM?

1

1 Answers

1
votes

As I answered to similar question, weight vector of any linear classifier indicates feature importance: simply because final value is a linear combination of feature values with weights as coefficients, so the bigger weight, the more impact to the final value is caused by the corresponding summand.

Thus, for linear classifier you can take features with biggest weights (not with biggest values of the feature itself, or the biggest product of weight and feature value).

It also explains why SVM with non-linear kernels like RBF don't have such a property: both feature values and weights are transformed into another space and you can't say that the bigger weight leads to bigger impact, see wiki.

If you need to select most important features for non-linear SVM, use special methods for feature selection, namely wrapper methods.