8
votes

I'm trying to test my model with new dataset. I have done the same preprocessing step as i have done for building my model. I have compared two files but there is no issues. I have all the attributes(train vs test dataset) in same order, same attribute names and data types. But still i'm not able to resolve the issue. Both of the files train and test seems to be similar but the weka explorer is giving me error saying Train and test set are not compatible. How to resolve this error? Is there any way to make test.arff file format as train.arff? Please somebody help me.

Here is the screenshot for file comparision

6
It's a little hard for me to understand your question. Can you show more detail?Annie Kim
Hi AnnieKimless, Thanks for your response. I have built a classification model with data set train.arff and now i'm trying to predict the result for test.arff testing file using weka explorer. Both of the files train and test seems to be similar but weka explorer is throwing error saying Train and test set are not compatible. How to resolve this error? Is there any way to make test.arf file format as train.arff?Suren Raju
All the three attributes are nominal attributes followed by all the possible values quoted by '{}'. One of my guess is that the possible values are not the same. For example, for RESOURCE attribute there is no 199 in test file, while it is in training-file. What do you think?Annie Kim
Hi AnnieKim, Thank you so much. Your inputs were really useful. As you suspected issue is with the nominal type. Please post your comment as answer. Thanks a lot.Suren Raju
Okay, you're welcome.Annie Kim

6 Answers

8
votes

The same with the comment that I left after problem statement:

All the three attributes are nominal attributes followed by all the possible values quoted by '{}'. One of my guess is that the possible values are not the same. For example, for RESOURCE attribute there is no 199 in test file, while it is in training-file.

3
votes

After struggling with the same problem for a day. I figured out two ways to make the trained model working on supplied test set.

Method 1. Use knowledge flow. For example something like below: CSVLoader(for train set) -> classAssigner -> TrainingSetMaker -->(classifier of your choice) -> ClassfierPerformanceEvaluator - TextViewer. CSVLoader(for test set) -> classAssigner -> TestgSetMaker -->(the same classifier instance above) -> PredictionAppender -> CSVSaver. Then load the data from the CSVLoader or arffLoder for the training set. The model will be trained. After that load data from the loader for the test set. It will evaluate the model(classifier, for example) on the supplied test set and you can see the result from the textviewer (connected to the ClassifierPerformanceEvaluator) and get the saved result from the CSVSaver or arffSaver connected to the PredictionAppender.An additional column, the "classfied as" will be added to the output file. In my case, I used "?" for the class column in the supplied test set if the class labels are not available.

Method 2. Combine the Training and Test set into one file. Then the exact same filter can be applied to both training and test set. Then you can separate training set and test set by applying instance filter. Since I use "?" as class label in the test set. It is not visible in the instance filter indices. Hence just select those indices that you can see in the attribute values to be removed when apply the instance filter. You will get the test data left only. Save it and load it in supply test set at the classifier page.This time it will work. I guess it is the class attribute that causes the NOT compatible train and test set issue. As many classfier requires nominal class attribute. The value of which is converted to the index to available values of the class attribute according to http://weka.wikispaces.com/Why+do+I+get+the+error+message+%27training+and+test+set+are+not+compatible%27%3F

2
votes

See following answer, your train.arff and test.arff should have same header. According to your comparison they are similar but not same.

1
votes

I just encountered the same problem and I found a bare-bones solution. The format of my file is .csv and I simply open my files(for training and testing,respectively) and use the save button on the Preprocess panel of WEKA to save them in .arff format. Then the problem is solved.

0
votes

Look there is a difference between similar and same, your train.arrf and test.arrf should have the same header and if not then you should copy the header of train.arrf and paste it in your test.arrf as a new header.

0
votes
    trainPath = ""

    otherPadelPath = ""

    testPath = ""



    trainFile = open(trainPath,"r")

    trainAttributes = trainFile.readlines()[0].split(",")
    trainFile.close()



    otherPadelFile = open(otherPadelPath,"r")


    otherPadelLines = otherPadelFile.readlines()
    otherPadelFile.close()
    otherPadelColumns = []

    testLines = []

    for attribute in trainAttributes:
      if attribute in otherPadelLines[0].split(","):
        otherPadelColumns += [otherPadelLines[0].split(",").index(attribute)]


    for line in otherPadelLines:
      rearrangedLine = []
      for inDex in otherPadelColumns:
        rearrangedLine += [line.split(",")[inDex]]
      testLines += [",".join(rearrangedLine)]




    testFile = open(testPath,"w")
    testFile.writelines(testLines)
    testFile.close()

This script can rearrange your test dataset to contain the same order/number of attribute columns in your training set, provided that each attribute has the same type and title. Also, (in keeping with WEKA default), the class attribute should be in the last column for both datasets.