I have trained a SVM and logistic regression classifier on my dataset. Both classifier provide a weight vector which is of the size of the number of features. I can use this weight vector to select the 10 most important features by just selecting the 10 features with the highest weights.
Should I use the absolute values of the weights, i.e. selecting the 10 features with the highest absolute values?
Second, this only works for SVM with linear kernel but not with RBF kernel as I have read. For non-linear kernel the weights are somehow no more linear. What is the exact reason that the weight vector cannot be used to determine the importance of features in case of non-linear kernel SVM?