I'm trying to apply feature selection of a dataset with 1700 features and 3300 instances. One of the ways for feature selection is stepwise regression. It is a greedy algorithm that deletes the worst feature at each round.
I'm using data's performance on SVM as a metric to find which is the worst feature. First time, I train the SVM 1700 times and each time keep only one feature out. At the end of this iteration, I remove the feature from the set whose removal resulted in highest SVM performance. So we are now left with 1699 features.
Second time, I train the SVM 1699 times and each time keep one feature out, and so on.
If I want to reduce the dataset to 100 features, then this program will train a SVM (1700!-100!) times. This is intractable. Any suggestions on how to avoid such a problem?