1
votes

I am using weka to predict the class for some instances. For this purpose, I have a training file and a test file. After saving the model obtained by running a 10-fold cross validation test using the J48 classifier, I use this model to predict the classes for the instances in the test file. The problem is that the predicted class is the same for all instances.

=== Predictions on test data ===

 inst#     actual  predicted error prediction 
     1    34:BALT    1:Theme   +   0.216 
     2    34:BALT    1:Theme   +   0.216 
     3    34:BALT    1:Theme   +   0.216 
     4    34:BALT    1:Theme   +   0.216 
     5    34:BALT    1:Theme   +   0.216 
     6    34:BALT    1:Theme   +   0.216 
     7    34:BALT    1:Theme   +   0.216 
     8    34:BALT    1:Theme   +   0.216 
     9    34:BALT    1:Theme   +   0.216 
    10    34:BALT    1:Theme   +   0.216 
    11    34:BALT    1:Theme   +   0.216 
    12    34:BALT    1:Theme   +   0.216 
    13    34:BALT    1:Theme   +   0.216 
    14    34:BALT    1:Theme   +   0.216 
and so on....

There are 14 different classes that can be predicted by the model and the information contained in the instances from the test file do not have the same values in it. So, why can this happen? Thank you very much.

1

1 Answers

1
votes

It is possible that your J48 decision tree is a single node that simply says each instance belongs to the class "theme". If you do this procedure with the Weka gui, you can right click the entry in the result buffer and select "Visualize Tree".

If you find that the tree is indeed a single node it may be this way because you are possibly working with an imbalanced dataset. For imbalanced datasets the J48 pruning procedures would likely find benefit from removing branches leading to minority classes and simply predict everything as what I guess to be the majority class "theme". This is a common problem with imbalanced datasets. You may try using SMOTE as a preprocessing procedure (here is a nice tutorial on SMOTE).

If you do not find that the tree is a single node you could try to make some predictions by hand and possibly come to a conclusion as to why every test instance is classified at "theme".