WEKA: Classify instances with a deserialized model

Question

I used Weka Explorer:

Loaded the arff file
Applied StringToWordVector filter
Selected IBk as the best classifier
Generated/Saved my_model.model binary

In my java code I deserialize the model:

    URL curl = ClassUtility.findClasspathResource( "models/my_model.model" );
    final Classifier cls = (Classifier) weka.core.SerializationHelper.read( curl.openConnection().getInputStream() );

Now, I have the classifier BUT I need somehow the information on the filter. Where I am getting is: how do I prepare an instance to be classified by my deserialized model (how do I apply the filter before classification) - (The raw instance that I have to classify has a field text with tokens in it. The filter was supposed to transform that into a list of new atributes)

I even tried to use a FilteredClassifier where I set the classifier to the deserialized on and the filter to a manually created instance of StringToWordVector

    final StringToWordVector filter = new StringToWordVector();
    filter.setOptions(new String[]{"-C", "-P x_", "-L"});
    FilteredClassifier fcls = new FilteredClassifier();
    fcls.setFilter(filter);
    fcls.setClassifier(cls);

The above does not work either. It throws the exception:

Exception in thread "main" java.lang.NullPointerException: No output instance format defined

What I am trying to avoid is doing the training in the java code. It can be very slow and the prospect is that I might have multiple classifiers to train (different algorithms as well) and I want my app to start fast.

Sentry Sentry · Accepted Answer · 2014-02-05T18:49:24

Your problem is that your model doesn't know anything about what the filter did to the data. The StringToWordVector filter changes the data, but depending on the input (training) data. A model trained on this transformed data set will only work on data that underwent the exact same transformation. To guarantee this, the filter needs to be part of your model.

Using a FilteredClassifier is the correct idea, but you have to use it from the beginning:

Load the ARFF file
Select FilteredClassifier as classifier
Select StringToWordVector as filter for it
Select IBk as classifier for the FilteredClassifier
Generate/Save the model to my_model.binary

The trained and serialized model will then also contain the intialized filter, including the information on how to transform data.

WEKA: Classify instances with a deserialized model

2 Answers