0
votes

Let's say I have the following data in ARFF format:

TRAIN:

@ATTRIBUTE A NUMERIC
@ATTRIBUTE B NUMERIC
@ATTRIBUTE C NUMERIC

TEST

@ATTRIBUTE ID NUMERIC
@ATTRIBUTE A NUMERIC
@ATTRIBUTE B NUMERIC
@ATTRIBUTE C NUMERIC
@ATTRIBUTE D NUMERIC
@ATTRIBUTE E NUMERIC

Now, to explain the attribute difference, on the TRAIN data, a feature selection was performed, so some attributes were removed. I need to get predictions on TEST dataset from classifier trained on TRAIN dataset, but TRAIN and TEST headers do not match. I tried to solve it by applying RemoveByName filters with the excess feature names as parameters, however it still fails with an error, that Train and test file not compatible!

I was reading this correspondence, where it is stated, that filters are applied also to test data, so they are compatible, but it looks like they are not in my case.

Do I have to create a separate new file externally for each subset of selected features in TRAIN file, or can I use FilteredClassifier to remove the features that are not needed? Or, can I somehow specify which attributes to use for prediction?

EDIT1:

I need to run everything from command line, I need to be able to supply variable parameters and variable filters for both the base classifier and the FilteredClassifier As @zbicyclist suggested, I tried to make it work through the InputMappedClassifier, by a command as follows:

java -Xmx4096m -cp data/java/weka.jar weka.classifiers.misc.InputMappedClassifier -t train.arff -T test_bin.arff -classifications weka.classifiers.evaluation.output.prediction.CSV -p first -file FILE.arff -suppress -S 1 -W weka.classifiers.meta.FilteredClassifier -- -F weka.filters.MultiFilter -F "weka.filters.unsupervised.attribute.RemoveByName -E ^ID$" -F "weka.filters.unsupervised.attribute.RemoveByName -E ^OD_VALUE$" -W weka.classifiers.rules.DecisionTable -- -I

Which looks like this, when I add newlines (which must be ommited before running it):

java -Xmx4096m -cp data/java/weka.jar 
weka.classifiers.misc.InputMappedClassifier
  -t train.arff
  -T test_bin.arff
  -classifications weka.classifiers.evaluation.output.prediction.CSV
  -p first
  -file FILE.arff
  -suppress
  -S 1
  -W weka.classifiers.meta.FilteredClassifier
--
  -F weka.filters.MultiFilter
  -F "weka.filters.unsupervised.attribute.RemoveByName -E ^ID$"
  -F "weka.filters.unsupervised.attribute.RemoveByName -E ^OD_VALUE$"
  -W weka.classifiers.rules.DecisionTable
--
  -I

It does not work though and says that: Weka exception: Illegal options: -F weka.filters.unsupervised.attribute.RemoveByName -E ^ID$ -F weka.filters.unsupervised.attribute.RemoveByName -E ^OD_VALUE$

Can anyone help me with nesting the command properly, so I can wrap the base classifier into FilteredClassifier and then wrap the filtered classifier into InputClassifier?

2

2 Answers

1
votes

I created files with numeric attributes ID...E in test and A...C in training as you show above.

I used a linear regression classifier on the training set (predicting C). I got this message: enter image description here

I selected "Yes", and got this output, which seems to be mapping correctly:

=== Run information ===

Scheme:       weka.classifiers.misc.InputMappedClassifier -I -trim -W weka.classifiers.functions.LinearRegression -- -S 0 -R 1.0E-8 -num-decimal-places 4
Relation:     Stack_train
Instances:    80
Attributes:   3
              A
              B
              C
Test mode:    user supplied test set:  size unknown (reading incrementally)

=== Classifier model (full training set) ===

InputMappedClassifier:


Linear Regression Model

C =

      0.888  * A +
      1.0225 * B +
      0.4933
Attribute mappings:

Model attributes        Incoming attributes
----------------        ----------------
(numeric) A         --> 2 (numeric) A
(numeric) B         --> 3 (numeric) B
(numeric) C         --> 4 (numeric) C


Time taken to build model: 0.02 seconds

=== Evaluation on test set ===

Time taken to test model on supplied test set: 0.02 seconds

=== Summary ===

Correlation coefficient                  0.8341
Mean absolute error                      0.2493
Root mean squared error                  0.2904
Relative absolute error                 59.797  %
Root relative squared error             56.5247 %
Total Number of Instances               80  

So things seem to be working appropriately (Weka 3.9), at least using linear regression as a classifier. What classifier are you using? Let me know and I'll try it.

1
votes

The problem is, that the inputs are probably compared before applying the filtering, thus you need to wrap it into InputMappedClassifier and filter unnecesary columns only after the input train features are mapped to correct input test features

I managed to come up with following command:

java -Xmx4096m -cp data/java/weka.jar weka.classifiers.misc.InputMappedClassifier \
-t train.arff \
-T test_bin.arff \
-classifications \
    "weka.classifiers.evaluation.output.prediction.CSV \
    -p first \
    -file FILE.arff \
    -suppress" \
-W weka.classifiers.meta.FilteredClassifier\
--\
    -F "weka.filters.MultiFilter -F \"weka.filters.unsupervised.attribute.RemoveByName -E ^ID$\" -F \"weka.filters.unsupervised.attribute.RemoveByName -E ^OD_VALUE$\""\
    -S 1\
    -W weka.classifiers.rules.DecisionTable \
    --\
        -I

Which seems to do what I need.

It is possible to nest classifiers by using the -W <classifier.name> argument last and then introducing the parameters for the nested classifier after the -- argument. No obscure quote backslashing required.