I am trying to classify an instance in java using the weka library and the tutorials online.
I have built a model in my device and loaded that model from the disk using this code.
public void makeModel() throws Exception
{
ArffLoader loader = new ArffLoader();
loader.setFile(new File("data.arff"));
Instances structure = loader.getDataSet();
structure.setClassIndex(1);
// train NaiveBayes
NaiveBayesMultinomial n = new NaiveBayesMultinomial();
FilteredClassifier f = new FilteredClassifier();
StringToWordVector s = new StringToWordVector();
s.setUseStoplist(true);
s.setWordsToKeep(100);
f.setFilter(s);
f.setClassifier(n);
structure.numAttributes();
f.buildClassifier(structure);
Instance current;
Evaluation eval = new Evaluation(structure);
eval.crossValidateModel(f, structure, 10, new Random(1));
System.out.println(eval.toSummaryString("\nResults\n======\n", false));
// output generated model
//System.out.println(f);
ObjectOutputStream oos = new ObjectOutputStream(
new FileOutputStream("classifier.model"));
oos.writeObject(f);
oos.flush();
oos.close();
}
------------------------ Output-------------
Results
Correctly Classified Instances 20158 79.6948 % Incorrectly Classified Instances 5136 20.3052 % Kappa statistic 0.6737 Mean absolute error 0.0726 Root mean squared error 0.2025 Relative absolute error 38.7564 % Root relative squared error 66.1815 % Coverage of cases (0.95 level) 96.4142 % Mean rel. region size (0.95 level) 27.7531 % Total Number of Instances 25294
Then i used the same model to classify an unlabelled instance.
public void classify() throws Exception
{
FilteredClassifier cls = (FilteredClassifier) weka.core.SerializationHelper.read("classifier.model");
Instances unlabeled = new Instances(
new BufferedReader(
new FileReader("test.arff")));
// set class attribute
unlabeled.setClassIndex(0);
// create copy
Instances labeled = new Instances(unlabeled);
// label instances
for (int i = 0; i < unlabeled.numInstances(); i++) {
System.out.println(labeled.instance(i).classValue());
System.out.print(", actual: " + labeled.classAttribute().value((int)labeled.instance(i).classValue()));
double clsLabel = cls.classifyInstance(unlabeled.instance(i));
labeled.instance(i).setClassValue(clsLabel);
System.out.println(", predicted: " + labeled.classAttribute().value((int) clsLabel));
}
// save labeled data
System.out.println("ended");
}
------------------------ Output---------------------------
1.0 , actual: Bud1? is a This is a new new string.txtIlocblobR(?????? @? @? @? @E?DSDB ` @? @? @, predicted: *WEKA*DUMMY*STRING*FOR*STRING*ATTRIBUTES* 2.0 , actual: This is a new new string , predicted: *WEKA*DUMMY*STRING*FOR*STRING*ATTRIBUTES* ended
However, my error is that the predicted is actually *WEKA*DUMMY*STRING*FOR*STRING*ATTRIBUTES* when it should have given me a class label instead.