I need to classify some text using weka programmatically, but I am having trouble as the training data and the to-be-classified data need to be filtered (the same way) before being used with the classifier.
My approach to the problem is currently: Create an arff with training data with a string attribute and a class. Use StringToWordVector over the data set and save the filter for future use. Use Attributeselection filter over the resulting data and save filter for future use. Train the classifier with that data and save the classifier. Create a "Instances" with the same attributes as the arff and populate it with the Instance I want to classify with the value of class attribute missing. Load the StringToWordVector filter and use it to filter Instances. Load AttributeSlection filter and use it to filter the result. Load the classifier and classify the result.
It seems that StringToWordVector is working as I expected and using the same set of words with the new data as with the old. The problem is with AttributeSelection that tries, it seems, to run again not knowing that I just want it to use the attributes it already filtered before.