0
votes

In our training set, we performed feature selection (ex. CfsSubsetEval GreedyStepwise) and then classified the instances using a classifier (ex. J48). We have saved the model Weka created.

Now, we want to classify new [unlabeled] instances (which still has the original number of attributes of the training set before it went under feature selection). Are we right in assuming that we should perform the feature selection in this set of new [unlabeled] instances so we could re-evaluate it using the saved model (to make the training and test sets compatible)? If yes, how can we filter the test set?

Thank you for helping!

3

3 Answers

0
votes

Yes, both test and training set must have the same number of attributes and each attribute must correspond to the same thing. So you should remove the same attributes (that you removed from training set) from your test set before classification.

0
votes

I don't think you have to perform feature selection on the test set. If your test set already has the original number of attributes, upload it, and in the "preprocess" window, manually remove all the attributes that were removed during the feature selection in the training set file.

0
votes

You must apply the same filter to the test set , that you have previously applied to the training set. You can use the WEKA API for applying the same filter to the test set as well.

Instances trainSet = //get training set
Instances testSet = //get testing set
AttributeSelection attsel = new AttributeSelection();//apply feature selection on training data
CfsSubsetEval ws = new CfsSubsetEval();
GreedyStepwise search = new GreedyStepwise();
attsel.setEvaluator(ws);
attsel.setSearch(search);
attsel.SelectAttributes(trainSet);

retArr = attsel.selectedAttributes();//get indicies of selected attributes

Filter remove = new Remove() //set up the filter for removing attributes
remove.setAttributeIndicesArray(retArr);
remove.setInvertSelection(true);//retain the selected,remove all others
remove.setInputFormat(trainSet);
trainSet = Filter.useFilter(trainSet, remove);

//now apply the same filter to the testing set as well
testSet = Filter.useFilter(testSet, remove);

//now you are good to go!