I am outlining the very basic how-to-do classification using Weka.
The Training File
You need a training file. Weka considers many different formats as training file (as well, test file). Among them are ARFF (Attribute-Relation File Format) and CSV (Comma Separated Value) format. Let's say we have a training file in ARFF format. A part of the file looks like as follows:
@relation pima_diabetes
@attribute 'preg' real
@attribute 'plas' real
@attribute 'pres' real
@attribute 'skin' real
@attribute 'insu' real
@attribute 'mass' real
@attribute 'pedi' real
@attribute 'age' real
@attribute 'class' { tested_negative, tested_positive}
@data
6,148,72,35,0,33.6,0.627,50,tested_positive
1,85,66,29,0,26.6,0.351,31,tested_negative
Note that to develop a good learner, you need to have substantial training data. As well, all of your classes should be well-represented in your training data so that the classifier you are going to develop from it has the distinguishing ability for the classes.
The Test File
As stated above, the test file can be of many different forms, too. Say, our test file is in ARFF format and a part of our test file is as follows:
@attribute 'preg' real
@attribute 'plas' real
@attribute 'pres' real
@attribute 'skin' real
@attribute 'insu' real
@attribute 'mass' real
@attribute 'pedi' real
@attribute 'age' real
@attribute 'class' { tested_negative, tested_positive}
@data
5,116,74,0,0,25.6,0.201,30,?
3,78,50,32,88,31,0.248,26,?
Note that the class labels for the test data are with '?' label because the labels are unknown and to be determined by the classifier you develop from training data.
The Code
Using the Java API, a trivial method to setup our classifier and build it on the training data and lastly apply it to classify unknown, unlabeled test instances can be as follows:
/**
* Method to build the naive bayes classifier and classify test documents
*/
public void classify(){
//setting the classifier--->
fc = new FilteredClassifier();
nb = new NaiveBayes();
fc.setFilter(filter);
fc.setClassifier(nb);
//<---setting of the classifier ends
//building the classifier--->
try {
fc.buildClassifier(data);
} catch (Exception e) {
System.out.println("Error from Classification.classify(). Cannot build classifier");
}
//<---building of the classifier ends
//Classification--->
clsLabel = new double[testData.numInstances()]; //holds class label of the test documents
//for each test document--->
for (int i = 0; i < testData.numInstances(); i ++){
try {
clsLabel[i] = fc.classifyInstance(testData.instance(i));
} catch (Exception e) {
System.out.println("Error from Classification.classify(). Cannot classify instance");
}
testData.instance(i).setClassValue(clsLabel[i]);
}//end for
//<---classification ends
}//end method
And that's how you classify test instances using Weka!