0
votes

my english is quite bad but i'll try to be clear. I want to program a classifier (J48 for exemple) with Weka. In my case, an instance consist of six numbers, all are between 0 and 10 except one which is between 0 and -10.

Exemples : 1,-3,6,3,6,7 or 1,-4,5,3,7,6 or 2,-4,5,3,8,6

In ARFF :

@ATTRIBUTE attribute1 {0,1,2,3,4,5,6,7,8,9,10}

@ATTRIBUTE attribute2 {0,-1,-2,-3,-4,-5,-6,-7,-8,-9,-10}

@ATTRIBUTE attribute3 {0,1,2,3,4,5,6,7,8,9,10}

...

This instances (exemples) are all "good". I would like to known if it's possible to create a classifier. I'll give it a new instance and it could answer (with a percent) if this instance is good or not. I ask that because i don't know how choose the class index or the outcome varible ...

1

1 Answers

1
votes

I am outlining the very basic how-to-do classification using Weka.

The Training File You need a training file. Weka considers many different formats as training file (as well, test file). Among them are ARFF (Attribute-Relation File Format) and CSV (Comma Separated Value) format. Let's say we have a training file in ARFF format. A part of the file looks like as follows:

@relation pima_diabetes
@attribute 'preg' real
@attribute 'plas' real
@attribute 'pres' real
@attribute 'skin' real
@attribute 'insu' real
@attribute 'mass' real
@attribute 'pedi' real
@attribute 'age' real
@attribute 'class' { tested_negative, tested_positive}
@data
6,148,72,35,0,33.6,0.627,50,tested_positive
1,85,66,29,0,26.6,0.351,31,tested_negative

Note that to develop a good learner, you need to have substantial training data. As well, all of your classes should be well-represented in your training data so that the classifier you are going to develop from it has the distinguishing ability for the classes.

The Test File As stated above, the test file can be of many different forms, too. Say, our test file is in ARFF format and a part of our test file is as follows:

@attribute 'preg' real
@attribute 'plas' real
@attribute 'pres' real
@attribute 'skin' real
@attribute 'insu' real
@attribute 'mass' real
@attribute 'pedi' real
@attribute 'age' real
@attribute 'class' { tested_negative, tested_positive}
@data
5,116,74,0,0,25.6,0.201,30,?
3,78,50,32,88,31,0.248,26,?

Note that the class labels for the test data are with '?' label because the labels are unknown and to be determined by the classifier you develop from training data.

The Code Using the Java API, a trivial method to setup our classifier and build it on the training data and lastly apply it to classify unknown, unlabeled test instances can be as follows:

/**
     * Method to build the naive bayes classifier and classify test documents
     */
    public void classify(){
        //setting the classifier--->
        fc = new FilteredClassifier();
        nb = new NaiveBayes();      
        fc.setFilter(filter);
        fc.setClassifier(nb);
        //<---setting of the classifier ends
        //building the classifier--->

        try {
            fc.buildClassifier(data);
        } catch (Exception e) {
            System.out.println("Error from Classification.classify(). Cannot build classifier");
        }
        //<---building of the classifier ends
        //Classification--->
        clsLabel = new double[testData.numInstances()]; //holds class label of the test documents
        //for each test document--->
        for (int i = 0; i < testData.numInstances(); i ++){
            try {
                clsLabel[i] = fc.classifyInstance(testData.instance(i));
            } catch (Exception e) {
                System.out.println("Error from Classification.classify(). Cannot classify instance");
            }
            testData.instance(i).setClassValue(clsLabel[i]);
        }//end for
        //<---classification ends
    }//end method

And that's how you classify test instances using Weka!