Let's say I have the following data in ARFF format:
TRAIN:
@ATTRIBUTE A NUMERIC
@ATTRIBUTE B NUMERIC
@ATTRIBUTE C NUMERIC
TEST
@ATTRIBUTE ID NUMERIC
@ATTRIBUTE A NUMERIC
@ATTRIBUTE B NUMERIC
@ATTRIBUTE C NUMERIC
@ATTRIBUTE D NUMERIC
@ATTRIBUTE E NUMERIC
Now, to explain the attribute difference, on the TRAIN data, a feature selection was performed, so some attributes were removed. I need to get predictions on TEST dataset from classifier trained on TRAIN dataset, but TRAIN and TEST headers do not match. I tried to solve it by applying RemoveByName filters with the excess feature names as parameters, however it still fails with an error, that Train and test file not compatible!
I was reading this correspondence, where it is stated, that filters are applied also to test data, so they are compatible, but it looks like they are not in my case.
Do I have to create a separate new file externally for each subset of selected features in TRAIN file, or can I use FilteredClassifier to remove the features that are not needed? Or, can I somehow specify which attributes to use for prediction?
EDIT1:
I need to run everything from command line, I need to be able to supply variable parameters and variable filters for both the base classifier and the FilteredClassifier As @zbicyclist suggested, I tried to make it work through the InputMappedClassifier, by a command as follows:
java -Xmx4096m -cp data/java/weka.jar weka.classifiers.misc.InputMappedClassifier -t train.arff -T test_bin.arff -classifications weka.classifiers.evaluation.output.prediction.CSV -p first -file FILE.arff -suppress -S 1 -W weka.classifiers.meta.FilteredClassifier -- -F weka.filters.MultiFilter -F "weka.filters.unsupervised.attribute.RemoveByName -E ^ID$" -F "weka.filters.unsupervised.attribute.RemoveByName -E ^OD_VALUE$" -W weka.classifiers.rules.DecisionTable -- -I
Which looks like this, when I add newlines (which must be ommited before running it):
java -Xmx4096m -cp data/java/weka.jar
weka.classifiers.misc.InputMappedClassifier
-t train.arff
-T test_bin.arff
-classifications weka.classifiers.evaluation.output.prediction.CSV
-p first
-file FILE.arff
-suppress
-S 1
-W weka.classifiers.meta.FilteredClassifier
--
-F weka.filters.MultiFilter
-F "weka.filters.unsupervised.attribute.RemoveByName -E ^ID$"
-F "weka.filters.unsupervised.attribute.RemoveByName -E ^OD_VALUE$"
-W weka.classifiers.rules.DecisionTable
--
-I
It does not work though and says that: Weka exception: Illegal options: -F weka.filters.unsupervised.attribute.RemoveByName -E ^ID$ -F weka.filters.unsupervised.attribute.RemoveByName -E ^OD_VALUE$
Can anyone help me with nesting the command properly, so I can wrap the base classifier into FilteredClassifier and then wrap the filtered classifier into InputClassifier?