
Basically, I'm building a machine learning model in Java (Weka) to detect some patterns in strings. I have 2 class attributes that I'm trying to get my model to predict based on these patterns. My code works when I leave the attribute values in the ARFF file, but it doesn't when I take it out and replace it with question marks in the test file. When I do this, it gives me all the same values (cfb) in the output. I know the model isn't hard-coded but for testing purposes, I would like to remove these attribute values. I have already built the classifier and evaluated the model.

 * Make predictions based on that model. Improve the model
 * @throws Exception
public void modelPredictions(Instances trainedDataSet, Instances testedDataSet, Classifier classifierType) throws Exception {
    // Get the number of classes
    int numClasses = trainedDataSet.numClasses();
    // print out class values in the training dataset
    for (int i = 0; i < numClasses; i++) {
        // get class string value using the class index
        String classValue = trainedDataSet.classAttribute().value(i);
        System.out.println("Class Value " + i + " is " + classValue);
    // set class index to the last attribute
    // loop through the new dataset and make predictions
    System.out.println("Actual Class, NB Predicted");
    for (int i = 0; i < testedDataSet.numInstances(); i++) {
        // get class double value for current instance
        double actualClass = testedDataSet.instance(i).classValue();
        // get class string value using the class index using the class's int value
        String actual = testedDataSet.classAttribute().value((int) actualClass);
        // get Instance object of current instance
        Instance newInst = testedDataSet.instance(i);
        // call classifyInstance, which returns a double value for the class
        double predNB = classifierType.classifyInstance(newInst);
        // use this value to get string value of the predicted class
        String predString = testedDataSet.classAttribute().value((int) predNB);
        System.out.println(actual + ", " + predString);

Image of the test ARFF File (Sorry, was getting errors in pasting the file content of the file.


1 Answers


If you replace the actual class in your test set with question marks, these get interpreted as missing values. A missing value in Weka is represented by Double.NaN. Casting a missing value (ie Double.NaN) to an int will result in 0, which is the first nominal value of your class. Your actual class will always be the first class label.

The following code:

double missing = Utils.missingValue();
System.out.println("missing value as double: " + missing);
System.out.println("missing value as int: " + ((int) missing));

Outputs this:

missing value as double: NaN
missing value as int: 0