3
votes

Good Evening,

I am working on a supervised classification task. I have a big arff file full of data in the format, "text", class. There are only two classes, E and I.

I can load this data into Weka Explorer, apply the StringToWordVector with TF-IDF on it, then using LibSVM classify it and get results. But I need to use 5x2 Cross-Validation and get the Area under the ROC Curve. So I save that processed data, open up Weka Experimenter, load it in, set it to 2 folds, 5 iterations, and then set the algorithm to libSVM.

When I go to the RUN tab and press start I get the following error:

18:31:18: Started

18:31:18: Class attribute is not nominal!

18:31:18: Interrupted

18:31:18: There was 1 error

I don't know why this is happening, what exactly the error is, or how to fix it. I google this error and it is not leading me to any solutions. I am not sure where I should go from here to fix this.

I can go back to Explorer, reload in that processed file, and classify it without any issues but I need to do it in Experimenter.

3
This is maybe a bit too late, but it is not possible with the currently available LibSVM classifiers for Weka to get the correct AUC values, unless you use the LibSVM with class probabilities. AUC and accuracy are always the same otherwise.Sentry

3 Answers

6
votes

In my case, there were nominal attributes in the file. However, Weka expects these to be last, since they indicate the class that the record is being assigned to. Here's how I rearranged the data so that the nominal value was last:

  1. In Explorer, open the arff file.
  2. Click 'Edit...' then find the column which should be the class of each record.
  3. Right click on the column header and select 'Attribute as class'.
  4. Click 'Save...' and use this new dataset in Experimenter.

Works like a charm.

3
votes

If your class attribute is numeric (like 0,1) change it to a nominal form like true, false.

0
votes

The StringToWordVector filter puts the class attribute as the first attribute in the data that it outputs. The Experimenter expects the last attribute in the data to be the class. You can reorder the attributes of the filtered data, but the best (and correct approach in general when combining filters with classifiers) is to use the FilteredClassifier to encapsulate your base classifier (LibSVM) with the StringToWordVector filter. This should work out just fine because the class attribute is the last attribute in your original "text", class data.