2
votes

I am using the WEKA API weka-stable-3.8.1.
I have been trying to use J48 decision tree(C4.5 implementation of weka). My data has around 22 features and a nominal class with 2 possible values : yes or no.
While evaluating with the following code :

Classifier model = (Classifier) weka.core.SerializationHelper.read(trainedModelDestination);
Evaluation evaluation = new Evaluation(trainingInstances);
evaluation.evaluateModel(model, testingInstances);
System.out.println("Number of correct predictions : "+evaluation.correct());


I get all predictions correct. But when I try these test cases individually using :

for(Instance i : testingInstances){
    double predictedClassLabel = model.classifyInstance(i);
    System.out.println("predictedClassLabel : "+predictedClassLabel);
}


I always get the same output, i.e. 0.0.

Why is this happening ?

2
that's the predicted class label 0, maybe your testing instance only contain class label 0 and thus everything is correct.Thomas Jungblut
Na, I tried with different test cases with known results. Also tried instances from trained set.YetAnotherBot
May be your test instances are totally differents from train instances. Have you tried to evaluate your model on your train set?Istvan Nagy
Evaluation with training set gives 100% accuracy.YetAnotherBot
In this case, your model trained your dataset. May I ask how big is your training and test data?Istvan Nagy

2 Answers

0
votes

If the provided snippet is indeed from your code, you seem to be always classifying the first test instance: "testingInstances.firstInstance()".

Rather, you may want to make a loop to classify each test instance.

for(Instance i : testingInstances){
    double predictedClassLabel = model.classifyInstance(i);
    System.out.println("predictedClassLabel : "+predictedClassLabel);
}
0
votes

Should have updated much sooner. Here's how I fixed this:

During the training phase, the model learns from your training set. While learning from this set it encounters categorical/nominal features as well.

Most algorithms require numerical values to work. To deal with this the algorithm maps the variables to a specific numerical value. longer explanation here

Since the algorithm has learned this during the training phase, the Instances object holds this information. During testing phase you have to use the same Instances object that was created during training phase. Otherwise, the testing classifier will not correctly map your nominal values to their expected values.

Note:

This kind of encoding gives biased training results in Non-tree based models and things like One-Hot-Encoding should be used in such cases.