Implementing Backward Greedy for Feature Selection

Question

I'm trying to apply feature selection of a dataset with 1700 features and 3300 instances. One of the ways for feature selection is stepwise regression. It is a greedy algorithm that deletes the worst feature at each round.

I'm using data's performance on SVM as a metric to find which is the worst feature. First time, I train the SVM 1700 times and each time keep only one feature out. At the end of this iteration, I remove the feature from the set whose removal resulted in highest SVM performance. So we are now left with 1699 features.

Second time, I train the SVM 1699 times and each time keep one feature out, and so on.

If I want to reduce the dataset to 100 features, then this program will train a SVM (1700!-100!) times. This is intractable. Any suggestions on how to avoid such a problem?

What are you looking for? A smarter algorithm for this method, or a different feature selection method altogether? — Fred Foo

etov etov · Accepted Answer · 2013-06-05T10:46:20

I'll start by saying that you might want to consider a different algorithm, e.g. use infogain.

However, to answer the question: You could try eliminating more than a single feature at each iteration; start with elimination of many features, and reduce this number as you progress.

E.g. after the first run (1700 SVM trains), instead of eliminating just one feature, eliminate the worst 200 features, then repeat with 1500, etc. When you get to, say, 300 features start eliminating 10 each time; then from 150 to 100 eliminate just one after each iteration. This will require training an SVM "only" about 20K times. You could increase the numbers if this is still too many. I speculate the results will be quite similar, or at least not substantially worse than running the as you suggest.

Implementing Backward Greedy for Feature Selection

1 Answers