The reason I want to do this is that I have an additional dataset of numbered files (lets call it 'numberedFiles' dataset) which correspond to the string IDs on the instances, and I want to use the prediction labels as filters to get a subset of numberedFiles dataset. Just to provide more context, I want to segment false positives and false negatives separately (this is binary classification with labels 'yes' and 'no').
Therefore, the theoretical workflow would be this:
1) load .arff file in java, setup FilteredClassifier with RemoveType filter for string ID.
2) Use Evaluation class to perform cross-validation.
3) Somehow access the instances WITH their new prediction labels.
4) Loop over all instances WITH their new prediction label. If prediction=='no' and actual=='yes', write the string ID associated with the sample to a text file for false negatives. If prediction=='yes' and actual=='no', write the string ID associated with the sample to a text file for false positives.
As you can see, the problem is with step 3. So far, I can only use 'eval.predictions()' to get a FastVector of NominalPrediction elements. These elements contain the predicted label, actual label, weight, distribution. BUT they do not contain any information about the features associated with the instance (namely the string ID I need). I know in the weka Explorer you can have it output additional attributes of each instance (the string ID which I need in this case), but I can't figure out how to access this in java.
Btw, here is the code snippet for reference
Instances trainingData = DataSource.read("my_data_file.arff");
trainingData.setClassIndex(trainingData.numAttributes()-1);
// build classifier
J48 classifier = new J48();
RemoveType removeID = new RemoveType();
removeID.setAttributeType(new SelectedTag("Delete string attributes", RemoveType.TAGS_ATTRIBUTETYPE));
FilteredClassifier meta = new FilteredClassifier();
meta.setClassifier(classifier);
meta.setFilter(removeID);
Evaluation eval = new Evaluation(trainingData);
Random r = new Random(1);
// 10-fold
eval.crossValidateModel(meta, trainingData, 10, r);
System.out.println(eval.toSummaryString("=== Cross-Validation Summary ===\n", false));
System.out.println(eval.toClassDetailsString("=== Detailed Accuracy By Class ===\n"));
System.out.println(eval.toMatrixString("=== Confusion Matrix ===\n"));
FastVector predictions = eval.predictions();