0
votes

I've some problems when comparing Weka GUI classification results with my Java program, performing a tree (J48) with the iris dataset. I'd be very grateful if you could help me.

I'm working with iris dataset, and I'm trying to develop a Java program to classify new instances. For this, I've used the Weka GUI to obtained a model ("iris_tree(CV).model"), which was trained and validated (cross-validated with 10 folds). The results with the Weka GUI are good and expected: 4 incorrectly classified instances. After that I save the model to be used later by my Java program.

When I load the model "iris_tree(CV).model" in my Java program, and I try to classify new instances (testing dataset), the results are different: the Java programm classifies well 'setosa' and 'virginica', but not 'versicolour'. These are the results:

Classification: setosa
Classification: setosa
Classification: virginica
Classification: virginica
Classification: virginica
Classification: virginica

When I expected to obtain:

Classification: setosa
Classification: setosa
Classification: versicolour
Classification: versicolour
Classification: virginica
Classification: virginica

I've read some related posts, but I couldn't find a clear response to this strange behaviour when using Java instead of Weka GUI.

I attach the Java code in 2 classes, and later the training and testing set. Thanks in advance.

The main class:

public static void main(String[] args) {

    try {


        Hashtable<String, String> values = new Hashtable<String, String>();

        //Loading the model
        String pathModel="";
        String pathTestSet="";
        JFileChooser chooserModel = new JFileChooser();
        chooserModel.setCurrentDirectory(new java.io.File("."));
        chooserModel.setDialogTitle("HoliDes: choose the model");
        chooserModel.setFileSelectionMode(JFileChooser.FILES_AND_DIRECTORIES);
        chooserModel.setAcceptAllFileFilterUsed(true);

        if (chooserModel.showOpenDialog(null) == JFileChooser.APPROVE_OPTION) {
            File filePathModel=chooserModel.getSelectedFile();
            pathModel=filePathModel.getPath();

            State irisModel = new State(pathModel);

            //Loading the model
            JFileChooser chooserTestSet = new JFileChooser();
            chooserTestSet.setDialogTitle("HoliDes: choose TEST SET");
            chooserTestSet.setFileSelectionMode(JFileChooser.FILES_AND_DIRECTORIES);
            chooserTestSet.setAcceptAllFileFilterUsed(true);

            //Loading the testing dataset
            if (chooserTestSet.showOpenDialog(null) == JFileChooser.APPROVE_OPTION) {
                File filePathTestSet=chooserTestSet.getSelectedFile();
                pathTestSet=filePathTestSet.getPath();

                //Transforming the data set into pairs attribute-value
                ConverterUtils.DataSource unlabeledSource = new ConverterUtils.DataSource(pathTestSet);
                Instances unlabeledData = unlabeledSource.getDataSet();
                if (unlabeledData.classIndex() == -1){
                    unlabeledData.setClassIndex(unlabeledData.numAttributes() - 1);
                }

                for (int i = 0; i < unlabeledData.numInstances(); i++) {
                    Instance ins=unlabeledData.instance(i);

                    for (int j = 0; j < ins.numAttributes(); j++) {

                        String attrib=ins.attribute(j).name();
                        double val=ins.value(ins.attribute(j));

                        values.put(attrib,String.valueOf(val));

                    }

                    System.out.println("Classification: " + irisModel.classifySpecies(values,pathModel));

                }

            }

        }

    } catch (Exception ex) {
        Logger.getLogger(PilotPatternClassifier.class.getName()).log(Level.SEVERE, null, ex);
    }

}

And the State class:

public class State {

    //private String classModelFile = "/iris_tree.model";    
    private Classifier classModel;
    private Instances dataModel;

    /**
     *  Class constructor.
     */
    public State(String pathModel) throws Exception {
            //InputStream classModelStream;
            //  Create a stream object for the model file embedded within the JAR file.
            //classModelStream = getClass().getResourceAsStream(classModelFile);
            classModel=(Classifier) weka.core.SerializationHelper.read(pathModel);
    }

    /**
     *  Close the instance by setting both the model file string and
     *  the model object itself to null.  When the garbage collector
     *  runs, this should make clean up simpler.  However, the garbage
     *  collector is not called synchronously since that should be
     *  managed by the larger execution environment.
     */
    public void close() {
            classModel = null;
            //classModelFile=null;
    }

    /**
     * Evaluate the model on the data provided by @param measures.
     * This returns a string with the species name.
     *
     * @param measures object with petal and sepal measurements
     * @return string with the species name
     * @throws Exception
     */
    public String classifySpecies(Dictionary<String, String> measures, String pathTestSet) throws Exception {
            FastVector dataClasses = new FastVector();
            FastVector dataAttribs = new FastVector();
            Attribute species;
            double values[] = new double[measures.size() + 1];
            int i = 0, maxIndex = 0;

            //  Assemble the potential species options.
            dataClasses.addElement("setosa");
            dataClasses.addElement("versicolour");
            dataClasses.addElement("virginica");
            species = new Attribute("species", dataClasses);

            //  Create the object to classify on.
            for (Enumeration<String> keys = measures.keys(); keys.hasMoreElements(); ) {

                    String key = keys.nextElement();
                    double val = Double.parseDouble(measures.get(key));         
                    dataAttribs.addElement(new Attribute(key));

                    values[i++] = val;

            }

            dataAttribs.addElement(species);
            dataModel = new Instances("iris-test", dataAttribs, 0);//"classify" is the name of the relationship of the test file. It is arbitrary
            dataModel.setClass(species);

            Instance ins=new DenseInstance(1, values);
            //dataModel.add(new Instance(1, values) {});            
            dataModel.add(ins);            
            dataModel.instance(0).setClassMissing();

            //  Find the class with the highest estimated likelihood
            double cl[] = classModel.distributionForInstance(dataModel.instance(0));
            for(i = 0; i < cl.length; i++){
                if(cl[i] > cl[maxIndex]){
                        maxIndex = i;
                }
            }
            return dataModel.classAttribute().value(maxIndex);


    }


}

Here the training and testing set:

@RELATION iris-train

@ATTRIBUTE sepallength  REAL
@ATTRIBUTE sepalwidth   REAL
@ATTRIBUTE petallength  REAL
@ATTRIBUTE petalwidth   REAL
@ATTRIBUTE species  {setosa,versicolour,virginica}

@DATA
5.1,3.5,1.4,0.2,setosa
4.9,3.0,1.4,0.2,setosa
4.7,3.2,1.3,0.2,setosa
4.6,3.1,1.5,0.2,setosa
5.0,3.6,1.4,0.2,setosa
5.4,3.9,1.7,0.4,setosa
4.6,3.4,1.4,0.3,setosa
5.0,3.4,1.5,0.2,setosa
4.4,2.9,1.4,0.2,setosa
4.9,3.1,1.5,0.1,setosa
5.4,3.7,1.5,0.2,setosa
4.8,3.4,1.6,0.2,setosa
4.8,3.0,1.4,0.1,setosa
4.3,3.0,1.1,0.1,setosa
5.8,4.0,1.2,0.2,setosa
5.7,4.4,1.5,0.4,setosa
5.4,3.9,1.3,0.4,setosa
5.1,3.5,1.4,0.3,setosa
5.7,3.8,1.7,0.3,setosa
5.1,3.8,1.5,0.3,setosa
5.4,3.4,1.7,0.2,setosa
5.1,3.7,1.5,0.4,setosa
4.6,3.6,1.0,0.2,setosa
5.1,3.3,1.7,0.5,setosa
4.8,3.4,1.9,0.2,setosa
5.0,3.0,1.6,0.2,setosa
5.0,3.4,1.6,0.4,setosa
5.2,3.5,1.5,0.2,setosa
5.2,3.4,1.4,0.2,setosa
4.7,3.2,1.6,0.2,setosa
4.8,3.1,1.6,0.2,setosa
5.4,3.4,1.5,0.4,setosa
5.2,4.1,1.5,0.1,setosa
5.5,4.2,1.4,0.2,setosa
4.9,3.1,1.5,0.1,setosa
5.0,3.2,1.2,0.2,setosa
5.5,3.5,1.3,0.2,setosa
4.9,3.1,1.5,0.1,setosa
4.4,3.0,1.3,0.2,setosa
5.1,3.4,1.5,0.2,setosa
5.0,3.5,1.3,0.3,setosa
4.5,2.3,1.3,0.3,setosa
4.4,3.2,1.3,0.2,setosa
5.0,3.5,1.6,0.6,setosa
5.1,3.8,1.9,0.4,setosa
4.8,3.0,1.4,0.3,setosa
5.1,3.8,1.6,0.2,setosa
4.6,3.2,1.4,0.2,setosa
5.3,3.7,1.5,0.2,setosa
5.0,3.3,1.4,0.2,setosa
7.0,3.2,4.7,1.4,versicolour
6.4,3.2,4.5,1.5,versicolour
6.9,3.1,4.9,1.5,versicolour
5.5,2.3,4.0,1.3,versicolour
6.5,2.8,4.6,1.5,versicolour
5.7,2.8,4.5,1.3,versicolour
6.3,3.3,4.7,1.6,versicolour
4.9,2.4,3.3,1.0,versicolour
6.6,2.9,4.6,1.3,versicolour
5.2,2.7,3.9,1.4,versicolour
5.0,2.0,3.5,1.0,versicolour
5.9,3.0,4.2,1.5,versicolour
6.0,2.2,4.0,1.0,versicolour
6.1,2.9,4.7,1.4,versicolour
5.6,2.9,3.6,1.3,versicolour
6.7,3.1,4.4,1.4,versicolour
5.6,3.0,4.5,1.5,versicolour
5.8,2.7,4.1,1.0,versicolour
6.2,2.2,4.5,1.5,versicolour
5.6,2.5,3.9,1.1,versicolour
5.9,3.2,4.8,1.8,versicolour
6.1,2.8,4.0,1.3,versicolour
6.3,2.5,4.9,1.5,versicolour
6.1,2.8,4.7,1.2,versicolour
6.4,2.9,4.3,1.3,versicolour
6.6,3.0,4.4,1.4,versicolour
6.8,2.8,4.8,1.4,versicolour
6.7,3.0,5.0,1.7,versicolour
6.0,2.9,4.5,1.5,versicolour
5.7,2.6,3.5,1.0,versicolour
5.5,2.4,3.8,1.1,versicolour
5.5,2.4,3.7,1.0,versicolour
5.8,2.7,3.9,1.2,versicolour
6.0,2.7,5.1,1.6,versicolour
5.4,3.0,4.5,1.5,versicolour
6.0,3.4,4.5,1.6,versicolour
6.7,3.1,4.7,1.5,versicolour
6.3,2.3,4.4,1.3,versicolour
5.6,3.0,4.1,1.3,versicolour
5.5,2.5,4.0,1.3,versicolour
5.5,2.6,4.4,1.2,versicolour
6.1,3.0,4.6,1.4,versicolour
5.8,2.6,4.0,1.2,versicolour
5.0,2.3,3.3,1.0,versicolour
5.6,2.7,4.2,1.3,versicolour
5.7,3.0,4.2,1.2,versicolour
5.7,2.9,4.2,1.3,versicolour
6.2,2.9,4.3,1.3,versicolour
5.1,2.5,3.0,1.1,versicolour
5.7,2.8,4.1,1.3,versicolour
6.3,3.3,6.0,2.5,virginica
5.8,2.7,5.1,1.9,virginica
7.1,3.0,5.9,2.1,virginica
6.3,2.9,5.6,1.8,virginica
6.5,3.0,5.8,2.2,virginica
7.6,3.0,6.6,2.1,virginica
4.9,2.5,4.5,1.7,virginica
7.3,2.9,6.3,1.8,virginica
6.7,2.5,5.8,1.8,virginica
7.2,3.6,6.1,2.5,virginica
6.5,3.2,5.1,2.0,virginica
6.4,2.7,5.3,1.9,virginica
6.8,3.0,5.5,2.1,virginica
5.7,2.5,5.0,2.0,virginica
5.8,2.8,5.1,2.4,virginica
6.4,3.2,5.3,2.3,virginica
6.5,3.0,5.5,1.8,virginica
7.7,3.8,6.7,2.2,virginica
7.7,2.6,6.9,2.3,virginica
6.0,2.2,5.0,1.5,virginica
6.9,3.2,5.7,2.3,virginica
5.6,2.8,4.9,2.0,virginica
7.7,2.8,6.7,2.0,virginica
6.3,2.7,4.9,1.8,virginica
6.7,3.3,5.7,2.1,virginica
7.2,3.2,6.0,1.8,virginica
6.2,2.8,4.8,1.8,virginica
6.1,3.0,4.9,1.8,virginica
6.4,2.8,5.6,2.1,virginica
7.2,3.0,5.8,1.6,virginica
7.4,2.8,6.1,1.9,virginica
7.9,3.8,6.4,2.0,virginica
6.4,2.8,5.6,2.2,virginica
6.3,2.8,5.1,1.5,virginica
6.1,2.6,5.6,1.4,virginica
7.7,3.0,6.1,2.3,virginica
6.3,3.4,5.6,2.4,virginica
6.4,3.1,5.5,1.8,virginica
6.0,3.0,4.8,1.8,virginica
6.9,3.1,5.4,2.1,virginica
6.7,3.1,5.6,2.4,virginica
6.9,3.1,5.1,2.3,virginica
5.8,2.7,5.1,1.9,virginica
6.8,3.2,5.9,2.3,virginica
6.7,3.3,5.7,2.5,virginica
6.7,3.0,5.2,2.3,virginica
6.3,2.5,5.0,1.9,virginica
6.5,3.0,5.2,2.0,virginica
6.2,3.4,5.4,2.3,virginica
5.9,3.0,5.1,1.8,virginica

and

@RELATION iris-test

@ATTRIBUTE sepallength  REAL
@ATTRIBUTE sepalwidth   REAL
@ATTRIBUTE petallength  REAL
@ATTRIBUTE petalwidth   REAL

@DATA
5.1,3.5,1.4,0.2
4.9,3.0,1.4,0.2
6.6,3.0,4.4,1.4
6.8,2.8,4.8,1.4
6.4,3.1,5.5,1.8
6.0,3.0,4.8,1.8

Thanks a lot for your help.

1
Could be the way you're reading the model? Have you tried stackoverflow.com/questions/22201949/…blueygh2
yes, this is the way: classModel=(Classifier) weka.core.SerializationHelper.read(pathModel);Txus Lopez
Classifier classModel=(Classifier) weka.core.SerializationHelper.read(pathModel);Txus Lopez
And the pathModel is "D:\Users\106811\Desktop\iris_tree(CV).model"Txus Lopez

1 Answers

0
votes

I think it's normal to have less accuracy when you are applying the classifier model to your testset than when you were checking using your training set feature file. Try using Weka GUI with this test set, maybe you will obtain the same result. It's not a problem of GUI vs Java

I would have put this as a comment, but can't comment due to lack of reputation.