We are trying to implement a WEKA classifier from inside a Java program. So far so good, everything works well however when building the classifier from the training set in Weka GUI we used the StringToWordVector IDF transform to help improve classification accuracy.
How, from within Java for new instances do I calculate the IDF transform to set for each token value in the new instance before passing the instance to the classifier?
The basic code looks like this:
Instances ins = vectorize(msg);
Instances unlabeled = new Instances(train,1);
Instance inst = new Instance(unlabeled.numAttributes());
String tmp = "";
for(int i=0; i < ins.numAttributes(); i++) {
tmp = ins.attribute(i).name();
if(unlabeled.attribute(tmp)!=null)
inst.setValue(unlabeled.attribute(tmp), 1.0); //TODO: Need to figure out the IDF transformed value to put here NOT 1!!
}
unlabeled.add(inst);
unlabeled.setClassIndex(classIdx);
.....cl.distributionForInstance(unlabeled.instance(i));
So how do I go about coding this so that I put the correct value in the new instance I want to classify?
Just to be clear the line inst.setValue(unlabeled.attribute(tmp), 1.0);
needs to be changed from 1.0
to the IDF transformed number...