2
votes

strange results come up while using a J48 tree. I need to classify a vector of 48 features, which works very well, but when i tried to "optimize", I run into strange results.

I have a method classify:

    public boolean classify(double feature1, double feature2, double[] featureVec ) {
        Instance toBeClassified = new Instance(2+featureVec.length);
        toBeClassified.setValue(0, feature1);
        toBeClassified.setValue(1, feature2);
        for (int i = 2; i < f.length + 2; ++i) {
            toBeClassified.setValue(i, featureVec [i - 2]);
        }
        toBeClassified.setDataset(dataset);

        try {
            double _class = tree.classifyInstance(toBeClassified);
            return _class > 0;
        } catch (Exception e1) {
            if(Logging.active) { 
                logger.error(e1.getMessage(), e1.getCause());}
            }
        return false;
    }
}

It works quite well, and i hope i'm doing things right. But I wanted to remove the instance creation which is done at every method call, so I moved the Instance toBeClassified = new Instance(48); line into the class body - so it is created only once. That works well too, despite of the fact, that I get slightly different results compared with the other. lets say, from 400 classifications, one is different (not to say, incorrect). But I don't see a reason for this...I hope here are some guys using weka, so that I understand whats going on/wrong. (Yes, 2+featureVec.length is 48).

Thanks and regards.

1
Do you consistently get the same different classification by the two methods for the same example on the same data set?brabster
Also - whilst I'm interested in knowing why you see a discrepancy - the Weka docs weka.sourceforge.net/doc suggest that it might be faster to create a new Instance instead of modifying an existing one. Assuming you're looking for best performance, have you timed/profiled the two methods and found which is faster?brabster
@Brabster Yes, im looking for performance but also for "I like it". I dident measure it yet, but I could to tomorrow when I'm back to work. Yes, this is consistently, If i just change the way of instance creation, i get different results all the time.InsertNickHere

1 Answers

3
votes

It's very unlikely that anything is wrong with J48. Both classifier creation and classification itself is deterministic. I'd recommend to post bigger part of Your code, because this one looks great (unbuggy).

As for Your 400 loop test: this one definitely should produce the same results every time, no exceptions. Two thoughts:

  • Put assert that checks if the values of instance are same as the model one. That would rule out any bug in Instance.

  • Does classification run in multi-threading manner? Are there any shared data objects?