I have a set of files and each file contains a unique identifier. I use Weka programatically and I create a training arff file. Each instance in the arff file corresponds to a set of attributes that I have extracted from each file, so one instance per file. How can I link the identifier of each file with the corresponding instance in the arff file. Thank you very much in advance.
5
votes
2 Answers
4
votes
You can associate an identifier with each instance by creating an extra attribute, as described here.
Thus, in your case, you would create a string attribute and add this attribute to each instance. When training and testing your classifier, you will want to remove the identifier, which can easily be done using either the Remove Type Filter, which by default removes String types; the Remove Filter would also work, where you specify the attribute index.
If you are running Weka with the command line, you can use the -p option to output the predictions and attributes (even for attributes that are filtered); see the bottom of the first link.