Weka : training and test set are not compatible

Question

Each row of my training and test datasets has intensity values for pixels in an image with the last column having the label which tells what digit is represented in the image; the label can be any number from 0 to 9 in training set and is always ? on test set.

I loaded the training dataset on Weka Explorer, passed the data through NumericalToNominal filter and used RemovePercentage filter to split the data in 70-30 ratio, the 30% file being used as cross validation set. I built a classifer and saved the model.

Then, I loaded the test data which has ? against label for each row and applied the NumericToNominal filter and saved it as arff file.

Now, when i load the test data and try to user the model against it, I always get the error message saying "training and test set are not compatible". Both datasets have undergone the same processing. What possibly could have gone wrong?

Sorry, cannot comment, too much to write, so I'll put it on answer — java_xof

java_xof java_xof · Accepted Answer · 2013-01-17T20:49:44

As you can read from ARFF manual (http://www.cs.waikato.ac.nz/ml/weka/arff.html):

Nominal values are defined by providing an listing the possible values: {, , , ...}

For example, the class value of the Iris dataset can be defined as follows:

@ATTRIBUTE class        {Iris-setosa,Iris-versicolor,Iris-virginica}

So when you apply NumericToNominal to your test file you can possibly have different number of possible values for one or more attributes within train and test arff - it really can happen, it bothered me many times - so one solution is to check your arff's manually (if it is not to big, or just copy and paste invocation of arff file with e.g.

@attribute 'My first binary attribute' {0,1}
(...)
@attribute 'My last binary attribute' {0,1}

from train to test file - should work

Weka : training and test set are not compatible

2 Answers